mirror of
https://github.com/janishutz/eth-summaries.git
synced 2026-01-12 20:28:31 +00:00
[SPCA] Restructuring, finish memory management in C, start dynamic memory management section
This commit is contained in:
17
semester3/spca/parts/01_c/00_intro.tex
Normal file
17
semester3/spca/parts/01_c/00_intro.tex
Normal file
@@ -0,0 +1,17 @@
|
||||
\begin{scriptsize}
|
||||
\textit{I can clearly C why you'd want to use C. Already sorry in advance for all the bad C jokes that are going to be part of this section}
|
||||
\end{scriptsize}
|
||||
|
||||
\texttt{C} is a compiled, low-level programming language, lacking many features modern high-level programming languages offer, like Object Oriented programming,
|
||||
true Functional Programming (like Haskell implements), Garbage Collection, complex abstract datatypes and vectors, just to name a few.
|
||||
(It is possible to replicate these using Preprocessor macros, more on this later).
|
||||
|
||||
On the other hand, it offers low-level hardware access, the ability to directly integrate assembly code into the \texttt{.c} files,
|
||||
as well as bit level data manipulation and extensive memory management options, again just to name a few.
|
||||
|
||||
This of course leads to \texttt{C} performing excellently and there are many programming languages whose compiler doesn't directly produce machine code or assembly,
|
||||
but instead optimized \texttt{C} code that is then compiled into machine code using a \texttt{C} compiler.
|
||||
This has a number of benefits, most notably that \texttt{C} compilers can produce very efficient assembly,
|
||||
as lots of effort is put into the \texttt{C} compilers by the hardware manufacturers.
|
||||
|
||||
There are many great \lC\ tutorials out there, a simple one (as for many other languages too) can be found \hlhref{https://www.w3schools.com/c/index.php}{here}
|
||||
12
semester3/spca/parts/01_c/01_basics/00_intro.tex
Normal file
12
semester3/spca/parts/01_c/01_basics/00_intro.tex
Normal file
@@ -0,0 +1,12 @@
|
||||
\subsection{Basics}
|
||||
\texttt{C} uses a very similar syntax as many other programming languages, like \texttt{Java}, \texttt{JavaScript} and many more\dots
|
||||
to be precise, it is \textit{them} that use the \texttt{C} syntax, not the other way around. So:
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{00_intro.c}
|
||||
|
||||
In \texttt{C} we are referring to the implementation of a function as a \bi{(function) definition} (correspondingly, \textit{variable definition}, if the variable is initialized)
|
||||
and to the definition of the function signature (or variables, without initializing them) as the \bi{(function) declaration} (or, correspondingly, \textit{variable declaration}).
|
||||
|
||||
\texttt{C} code is usuallt split into the source files, ending in \texttt{.c} (where the local functions and variables are declared, as well as all function definitions)
|
||||
and the header files, ending in \texttt{.h}, usually sharing the filename of the source file, where the external declarations are defined.
|
||||
By convention, no definition of functions are in the \texttt{.h} files, and neither variables, but there is nothing preventing you from putting them there.
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{01_func.h}
|
||||
14
semester3/spca/parts/01_c/01_basics/01_control-flow.tex
Normal file
14
semester3/spca/parts/01_c/01_basics/01_control-flow.tex
Normal file
@@ -0,0 +1,14 @@
|
||||
\newpage
|
||||
\subsubsection{Control Flow}
|
||||
Many of the control-flow structures of \texttt{C} can be found in the below code snippet.
|
||||
A note of caution when using goto: It is almost never a good idea (can lead to unexpected behaviour, is hard to maintain, etc).
|
||||
Where it however is very handy is for error recovery (and cleanup functions) and early termination of multiple loops (jumping out of a loop).
|
||||
So, for example, if you have to run multiple functions to set something up and one of them fails,
|
||||
you can jump to a label and have all cleanup code execute that you have specified there.
|
||||
And because the labels are (as in Assembly) simply skipped over during execution, you can make very nice cleanup code.
|
||||
We can also use \texttt{continue} and \texttt{break} statements similarly to \texttt{Java}, they do not however accept labels.
|
||||
(Reminder: \texttt{continue} skips the loop body and goes to the next iteration)
|
||||
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{01_func.c}
|
||||
|
||||
|
||||
86
semester3/spca/parts/01_c/01_basics/02_declarations.tex
Normal file
86
semester3/spca/parts/01_c/01_basics/02_declarations.tex
Normal file
@@ -0,0 +1,86 @@
|
||||
\newpage
|
||||
\subsubsection{Declarations}
|
||||
We have already seen a few examples for how \texttt{C} handles declarations.
|
||||
In concept they are similar (and scoping works the same) to most other \texttt{C}-like programming languages, including \texttt{Java}.
|
||||
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{02_declarations.c}
|
||||
|
||||
\newpage
|
||||
A peculiarity of \texttt{C} is that the bit-count is not defined by the language, but rather the hardware it is compiled for.
|
||||
\rmvspace
|
||||
|
||||
\begin{fullTable}{llll}{\texttt{C} data type & typical 32-bit & ia32 & x86-64}{Comparison of byte-sizes for each datatype on different architectures}
|
||||
\texttt{char} & 1 & 1 & 1 \\
|
||||
\texttt{short} & 2 & 2 & 2 \\
|
||||
\texttt{int} & 4 & 4 & 4 \\
|
||||
\texttt{long} & 4 & 4 & 8 \\
|
||||
\texttt{long long} & 8 & 8 & 8 \\
|
||||
\texttt{float} & 4 & 4 & 4 \\
|
||||
\texttt{double} & 4 & 8 & 8 \\
|
||||
\texttt{long double} & 8 & 10/12 & 16 \\
|
||||
\end{fullTable}
|
||||
|
||||
\drmvspace
|
||||
\warn{Type format} Be however aware that this table uses the \texttt{LP64} format for the x86-64 sizes
|
||||
and this is the format all UNIX-Systems use (i.e. Linux, BSD, Darwin (the Mac Kernel)).
|
||||
64 bit Windows however uses \texttt{LLP64}, i.e. \texttt{int} and \texttt{long} have the same size (32) and \texttt{long long} and pointers are 64 bit.
|
||||
|
||||
|
||||
\content{Integers} By default, integers in \lC\ are \texttt{signed}, to declare an unsigned integer, use \texttt{unsigned int}.
|
||||
Since it is hard and annoying to remember the number of bytes that are in each data type, \texttt{C99} has introduced the extended integer types,
|
||||
which can be imported from \texttt{stdint.h} and are of form \texttt{int<bit count>\_t} and \texttt{uint<bit count>\_t},
|
||||
where we substitute the \texttt{<bit count>} with the number of bits (have to correspond to a valid type of course).
|
||||
|
||||
|
||||
\content{Booleans} Another notable difference of \texttt{C} compared to other languages is that \texttt{C} doesn't natively have a \texttt{boolean} type,
|
||||
by convention a \texttt{short} is used to represent it, where any non-zero value means \texttt{true} and \texttt{0} means \texttt{false}.
|
||||
Since boolean types are quite handy, the \texttt{!} syntax for negation turns any non-zero value of any integer type into zero and vice-versa.
|
||||
\texttt{C99} has added support for a bool type via \texttt{stdbool.h}, which however is still an integer.
|
||||
|
||||
|
||||
\content{Implicit casts} Notably, \texttt{C} doesn't have a very rigid type system and lower bit-count types are implicitly cast to higher bit-count data types, i.e.
|
||||
if you add a \texttt{short} and an \texttt{int}, the \texttt{short} is cast to \texttt{short} (bits 16-31 are set to $0$) and the two are added.
|
||||
Explicit casting between almost all types is also supported.
|
||||
Some will force a change of bit representation, but most won't (notably, when casting to and from \texttt{float}-like types, minus to \texttt{void})
|
||||
|
||||
|
||||
\content{Expressions} Every \lC\ statement is also an expression, see above code block for example.
|
||||
|
||||
|
||||
\content{Void} The \texttt{void} type has \bi{no} value and is used for untyped pointers and declaring functions with no return value
|
||||
|
||||
|
||||
\content{Structs} Are like classes in OOP, but they contain no logic.
|
||||
We can assign copy a struct by assignment and they behave just like everything else in \texttt{C} when used as an argument for functions
|
||||
in that they are passed by value and not by reference.
|
||||
You can of course pass it also by reference (like any other data type) by setting the argument to type \texttt{struct mystruct * name} and then calling the function using
|
||||
\texttt{func(\&test)} assuming \texttt{test} is the name of your struct
|
||||
|
||||
|
||||
\content{Typedef} To define a custom type using \texttt{typedef <type it represents> <name of the new type>}.
|
||||
|
||||
You may also use \texttt{typedef} on structs using \texttt{typedef struct <struct tag> <name of the new alias>},
|
||||
you can thus instead of e.g. \verb|struct list_el my_list;| write \verb|list my_list;|, if you have used \verb|typedef struct list_el list;| before.
|
||||
It is even possible to do this:
|
||||
\drmvspace
|
||||
\begin{code}{c}
|
||||
typedef struct list_el {
|
||||
unsigned long val;
|
||||
struct list_el *next;
|
||||
} list_el;
|
||||
|
||||
struct list_el my_list;
|
||||
list_el my_other_list;
|
||||
\end{code}
|
||||
\rmvspace
|
||||
|
||||
\content{Namespaces}
|
||||
\lC\ has a few different namespaces, i.e. you can have the one of the same name in each namespace (i.e. you can have \texttt{struct a}, \texttt{int a}, etc).
|
||||
The following namespaces were covered:
|
||||
\rmvspace
|
||||
\begin{itemize}[noitemsep]
|
||||
\item Label names (used for \texttt{goto})
|
||||
\item Tags (for \texttt{struct}, \texttt{union} and \texttt{enum})
|
||||
\item Member names one namespace for each \texttt{struct}, \texttt{union} and \texttt{enum}
|
||||
\item Everything else mostly (types, variable names, etc, including typedef)
|
||||
\end{itemize}
|
||||
46
semester3/spca/parts/01_c/01_basics/03_operators.tex
Normal file
46
semester3/spca/parts/01_c/01_basics/03_operators.tex
Normal file
@@ -0,0 +1,46 @@
|
||||
\newpage
|
||||
\subsubsection{Operators}
|
||||
The list of operators in \lC\ is similar to the one of \texttt{Java}, etc.
|
||||
In Table \ref{tab:c-operators}, you can see an overview of the operators, sorted by precedence in descending order.
|
||||
You may notice that the \verb|&| and \verb|*| operators appear twice. The higher precedence occurrence is the address operator and dereference, respectively,
|
||||
and the lower precedence is \texttt{bitwise and} and \texttt{multiplication}, respectively.
|
||||
|
||||
Very low precedence belongs to boolean operators \verb|&&| and \texttt{||}, as well as the ternary operator and assignment operators
|
||||
\begin{table}[h!]
|
||||
\begin{tables}{ll}{Operator & Associativity}
|
||||
\texttt{() [] -> .} & Left-to-right \\
|
||||
\verb|! ~ ++ -- + - * & (type) sizeof| & Right-to-left \\
|
||||
\verb|* / %| & Left-to-right \\
|
||||
\verb|+ -| & Left-to-right \\
|
||||
\verb|<< >>| & Left-to-right \\
|
||||
\verb|< <= >= >| & Left-to-right \\
|
||||
\verb|== !=| & Left-to-right \\
|
||||
\verb|&| (logical and) & Left-to-right \\
|
||||
\verb|^| (logical xor) & Left-to-right \\
|
||||
\texttt{|} (logical or) & Left-to-right \\
|
||||
\verb|&&| (boolean and) & Left-to-right \\
|
||||
\texttt{||} (boolean or) & Left-to-right \\
|
||||
\texttt{? :} (ternary) & Right-to-left \\
|
||||
\verb|= += -= *= /= %= &= ^=||\verb|= <<= >>=| & Right-to-left \\
|
||||
\verb|,| & Left-to-right \\
|
||||
\end{tables}
|
||||
\caption{\lC\ operators ordered in descending order by precedence}
|
||||
\label{tab:c-operators}
|
||||
\end{table}
|
||||
|
||||
\shade{blue}{Associativity}
|
||||
\begin{itemize}
|
||||
\item Left-to-right: $A + B + C \mapsto (A + B) + C$
|
||||
\item Right-to-left: \texttt{A += B += C} $\mapsto$ \texttt{(A += B) += C}
|
||||
\end{itemize}
|
||||
|
||||
As it should be, boolean and, as well as boolean or support early termination.
|
||||
|
||||
The ternary operator works as in other programming languages \verb|result = expr ? res_true : res_false;|
|
||||
|
||||
As previously touched on, every statement is also an expression, i.e. the following works
|
||||
\mint{c}|printf("%s", x = foo(y)); // prints output of foo(y) and x has that value|
|
||||
|
||||
Pre-increment (\texttt{++i}, new value returned) and post-increment (\texttt{i++}, old value returned) are also supported by \lC.
|
||||
|
||||
\lC\ has an \texttt{assert} statement, but do not use it for error handling. The basic syntax is \texttt{assert( expr );}
|
||||
8
semester3/spca/parts/01_c/01_basics/04_arrays.tex
Normal file
8
semester3/spca/parts/01_c/01_basics/04_arrays.tex
Normal file
@@ -0,0 +1,8 @@
|
||||
\newpage
|
||||
\subsubsection{Arrays}
|
||||
\label{sec:c-arrays}
|
||||
\lC\ compiler does not do any array bound checks! Thus, always check array bounds.
|
||||
Unlike some other programming languages, arrays are \bi{not} dynamic length.
|
||||
|
||||
The below snippet includes already some pointer arithmetic tricks. The variable \texttt{data} is a pointer to the first element of the array.
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{03_arrays.c}
|
||||
6
semester3/spca/parts/01_c/01_basics/05_strings.tex
Normal file
6
semester3/spca/parts/01_c/01_basics/05_strings.tex
Normal file
@@ -0,0 +1,6 @@
|
||||
\subsubsection{Strings}
|
||||
\lC\ doesn't have a \texttt{string} data type, but rather, strings are represented (when using \texttt{ASCII}) as \texttt{char} arrays,
|
||||
with length of the array $n + 1$ (where $n$ is the number of characters of the string).
|
||||
The extra element is the termination character, called the \texttt{null character}, denoted \verb|\0|.
|
||||
To determine the actual length of the string (as it may be padded), we can use \verb|strnlen(str, maxlen)| from \texttt{string.h}
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{04_strings.c}
|
||||
39
semester3/spca/parts/01_c/01_basics/06_integers.tex
Normal file
39
semester3/spca/parts/01_c/01_basics/06_integers.tex
Normal file
@@ -0,0 +1,39 @@
|
||||
\subsubsection{Integers in C}
|
||||
As a reminder, integers are encoded as follows in big endian notation, with $x_i$ being the $i$-th bit and $w$ being the number of bits used to represent the number:
|
||||
\begin{itemize}[noitemsep]
|
||||
\item \bi{Unsigned}: $\displaystyle \sum_{i = 0}^{w - 1} x_i \cdot 2^i$
|
||||
\item \bi{Signed}: $\displaystyle -x_{w - 1} \cdot 2^{w - 1} + \sum_{i = 0}^{w - 1} x_i \cdot 2^i$ (two's complement notation, with $x_{w - 1}$ being the sign-bit)
|
||||
\end{itemize}
|
||||
The minimum number representable is $0$ and $-2^{w - 1}$, respectively, whereas the maximum number representable is $2^w - 1$ and $2^{w - 1} - 1$.
|
||||
\verb|limits.h| defines constants for the minimum and maximum values of different types, e.g. \verb|ULONG_MAX| or \verb|LONG_MAX| and \verb|LONG_MIN|
|
||||
|
||||
We can use the shift operators to multiply and divide by two. Shift operations are usually \textit{much} cheaper than multiplication and division.
|
||||
Left shift (\texttt{u << k} in \lC) always fills with zeros and throws away the extra bits on the left (equivalent to multiplication by $2^k$),
|
||||
whereas right shift (\texttt{u >> k} in \lC) is implementation-defined,
|
||||
either arithmetic (fill with most significant bit, division by $2^k$. This however rounds incorrectly, see below)
|
||||
or logical shift (fill with zeros, unsigned division by $2^k$).
|
||||
|
||||
Signed division using arithmetic right shifts has the issue of incorrect rounding when number is $< 0$.
|
||||
Instead, we represent $s / 2^k = s + (2^k - 1) \texttt{ >> } k$ for $s < 0$ and $s / 2^k = s >> k$ for $s > 0$
|
||||
|
||||
\bi{In expressions, signed values are implicitly cast to unsigned}
|
||||
|
||||
This can lead to all sorts of nasty exploits (e.g. provide $-1$ as the argument to \texttt{memcpy} and watch it burn, this was an actual exploit in FreeBSD)
|
||||
|
||||
\fhlc{Cyan}{Addition \& Subtraction}
|
||||
|
||||
A nice property of the two's complement notation is that addition and subtraction works exactly the same as in normal notation, due to over- and underflow.
|
||||
This also obviously means that it implements modular arithmetic, i.e.
|
||||
\mrmvspace
|
||||
\begin{align*}
|
||||
\texttt{Add}_w (u, v) = u + v \text{ mod } 2^w \ \text{ and } \ \texttt{Sub}_w (u, v) = u - v \text{ mod } 2^w
|
||||
\end{align*}
|
||||
|
||||
\mrmvspace
|
||||
\fhlc{Cyan}{Multiplication \& Division}
|
||||
|
||||
Unsigned multiplication with addition forms a commutative ring.
|
||||
Again, it is doing modular arithmetic and
|
||||
\begin{align*}
|
||||
\texttt{UMult}_w (u, v) = u \cdot v \text{ mod } 2^w
|
||||
\end{align*}
|
||||
53
semester3/spca/parts/01_c/01_basics/07_pointers.tex
Normal file
53
semester3/spca/parts/01_c/01_basics/07_pointers.tex
Normal file
@@ -0,0 +1,53 @@
|
||||
\newpage
|
||||
\subsubsection{Pointers}
|
||||
On loading of a program, the OS creates the virtual address space for the process, inspects the executable and loads the data to the right places in the address space,
|
||||
before other preparations like final linking and relocation are done.
|
||||
|
||||
Stack-based languages (supporting recursion) allocate stack in frames that contain local variables, return information and temporary space.
|
||||
When a procedure is entered, a stack frame is allocated and executes any necessary setup code (like moving the stack pointer, see later). % TODO: Link to correct section
|
||||
When a procedure returns, the stack frame is deallocated and any necessary cleanup code is executed, before execution of the previous frame continues.
|
||||
|
||||
\bi{In \lC\ a pointer is a variable whose value is the memory address of another variable}
|
||||
|
||||
Of note is that if you simply declare a pointer using \texttt{type * p;} you will get different memory addresses every time.
|
||||
The (Linux)-Kernel randomizes the address space to prevent some common exploits.
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{05_pointers.c}
|
||||
|
||||
\newpage
|
||||
\begin{scriptsize}
|
||||
Some pointer arithmetic has already appeared in section \ref{sec:c-arrays}, but same kind of content with better explanation can be found here
|
||||
\end{scriptsize}
|
||||
|
||||
\content{Pointer Arithmetic} Note that when doing pointer arithmetic, adding $1$ will move the pointer by \texttt{sizeof(type)} bits.
|
||||
|
||||
You may use pointer arithmetic on whatever pointer you'd like (as long as it's not a null pointer).
|
||||
This means, you \textit{can} make an array wherever in memory you'd like.
|
||||
The issue is just that you are likely to overwrite something, and that something might be something critical (like a stack pointer),
|
||||
thus you will get \bi{undefined} behaviour! (This is by the way a common concept in \lC, if something isn't easy to make more flexible
|
||||
(example for \texttt{malloc}, if you pass a pointer to memory that is not the start of the \texttt{malloc}'d section, you get undefined behaviour),
|
||||
in the docs mention that one gets undefined behaviour if you do not do as it says so\dots RTFM!)
|
||||
|
||||
As already seen in the section arrays (section \ref{sec:c-arrays}), we can use pointer arithmetic for accessing array elements.
|
||||
The array name is treated as a pointer to the first element of the array, except when:
|
||||
\begin{itemize}[noitemsep]
|
||||
\item it is operand of \texttt{sizeof} (return value is $n \cdot \texttt{sizeof(type)}$ with $n$ the number of elements)
|
||||
\item its address is taken (then \texttt{\&a == a})
|
||||
\item it is a string literal initializer. If we modify a pointer \texttt{char *b = "String";} to string literal in code,
|
||||
the \texttt{"String"} is stored in the code segment and if we modify the pointer, we get undefined behaviour
|
||||
\end{itemize}
|
||||
\shade{purple}{Fun fact}: \texttt{A[i]} is always rewritten \texttt{*(A + i)} by compiler.
|
||||
|
||||
\content{Function arguments} Another important aspect is passing by value or by reference.
|
||||
You can pass every data type by reference, you can not however pass an array by value (as an array is treated as a pointer, see above).
|
||||
|
||||
\content{Body-less loops}
|
||||
\rmvspace
|
||||
\begin{code}{c}
|
||||
int x = 0;
|
||||
while ( x++ < 10 ); // This is (of course) not a useful snippet, but shows the concept
|
||||
\end{code}
|
||||
|
||||
\content{Function pointers}
|
||||
A function can be passed as an argument to another function using the typical address syntax with the \verb|&| symbol is annotated as argument using
|
||||
\verb|type (* name)(type arg1, ...)|
|
||||
and is called using \verb|(*func)(arg1, ...)|.
|
||||
28
semester3/spca/parts/01_c/02_preprocessor.tex
Normal file
28
semester3/spca/parts/01_c/02_preprocessor.tex
Normal file
@@ -0,0 +1,28 @@
|
||||
\newpage
|
||||
\subsection{The C preprocessor}
|
||||
To have \texttt{gcc} stop compiliation after running through \texttt{cpp}, the \texttt{C preprocessor}, use \texttt{gcc -E <file name>}.
|
||||
|
||||
Imports in \lC\ are handled by the preprocessor, that for each \verb|#include <file1.h>|, the preprocessor simply copies the contents of the file recursively into one file.
|
||||
|
||||
Depending on if we use \verb|#include <file1.h>| or \verb|#include "file1.h"| the preprocessor will search for the file either in the system headers or in the project directory.
|
||||
Be wary of including files twice, as the preprocessor will recursively include all files (i.e. it will include files from the files we included)
|
||||
|
||||
The \lC\ preprocessor gives us what are called \texttt{preprocessor macros}, which have the format \verb|#define NAME SUBSTITUTION|.
|
||||
\rmvspace
|
||||
|
||||
\inputcodewithfilename{c}{code-examples/00_c/01_preprocessor/}{00_macros.c}
|
||||
|
||||
To avoid issues with semicolons at the end of preprocessor macros that wrap statements that cannot end in semicolons, we can use a concept called semicolon swallowing.
|
||||
For that, we wrap the statements in a \texttt{do \dots\ while(0)} loop, which is removed by the compiler on compile, also taking with it the semicolon.
|
||||
|
||||
There are also a number of predefined macros:
|
||||
\begin{itemize}[noitemsep]
|
||||
\item \verb|__FILE__|: Filename of processed file
|
||||
\item \verb|__LINE__|: Line number of this usage of macro
|
||||
\item \verb|__DATE__|: Date of processing
|
||||
\item \verb|__TIME__|: Time of processing
|
||||
\item \verb|__STDC__|: Set if ANSI Standard \lC\ compiler is used
|
||||
\item \verb|__STDC_VERSION__|: The version of Standard \lC\ being compiled
|
||||
\item \dots many more
|
||||
\end{itemize}
|
||||
In headers, we typically use \verb|#ifndef __FILENAME_H_| followed by a \verb|#define __FILENAME_H_| or the like to check if the header was already included before
|
||||
26
semester3/spca/parts/01_c/03_memory/00_intro.tex
Normal file
26
semester3/spca/parts/01_c/03_memory/00_intro.tex
Normal file
@@ -0,0 +1,26 @@
|
||||
\subsection{Memory}
|
||||
In comparison to most other languages, \lC\ does not feature automatic memory management, but instead gives us full, manual control over memory.
|
||||
This of course has both advantages and disadvantages.
|
||||
|
||||
\rmvspace
|
||||
\inputcodewithfilename{c}{code-examples/00_c/02_memory/}{00_memory.c}
|
||||
\drmvspace
|
||||
|
||||
Notably, the argument \texttt{size\_t sz} for \texttt{malloc}, \texttt{calloc} and \texttt{realloc} is an \texttt{unsigned} integer of some size
|
||||
and differs depending on hardware and software platforms.
|
||||
|
||||
\texttt{malloc} keeps track of which blocks are allocated. If you give \texttt{free} a pointer that isn't the start of the memory region previously \texttt{malloc}'d,
|
||||
you get undefined behaviour.
|
||||
|
||||
\warn{Memory corruption} There are many ways to corrupt memory in \lC. The below code shows off a few of them:
|
||||
|
||||
\rmvspace
|
||||
\inputcodewithfilename{c}{code-examples/00_c/02_memory/}{01_mem-corruption.c}
|
||||
\drmvspace
|
||||
|
||||
\warn{Memory leaks} If we allocate memory, but never free it, we use more and more memory (old memory is inaccessible)
|
||||
|
||||
\content{Dynamic data structures} We build it using structs that have a pointer to another struct inside them.
|
||||
We have to allocate memory for each element and then add the pointer to another struct.
|
||||
For a generic dynamic data structure, make the element a \texttt{void} pointer.
|
||||
This in general is the concept used for functions operating on any data type.
|
||||
37
semester3/spca/parts/01_c/03_memory/01_allocation.tex
Normal file
37
semester3/spca/parts/01_c/03_memory/01_allocation.tex
Normal file
@@ -0,0 +1,37 @@
|
||||
\subsubsection{Dynamic Memory Allocation}
|
||||
Memory allocated with \texttt{malloc} is typically $8$- or $16$-byte aligned.
|
||||
|
||||
\content{Explicit vs. Implicit} In explicit memory management, the application does both the allocation \textit{and} deallocation memory,
|
||||
whereas in implicit memory management, the application allocates the memory, but usually a \textit{Garbage Collector} (GC) frees it.
|
||||
|
||||
For some languages, like Rust, one would assume that it does implicit allocation, but Rust is a language using explicit management,
|
||||
it's just that the \textit{compiler} and not the programmer decides when to allocate and when to deallocate.
|
||||
|
||||
\warn{Assumptions in this course} We assume that memory is \bi{word} addressed (= 8 Bytes).
|
||||
|
||||
\content{Goals} The allocation should have the highest possible throughput and at the same time the best (i.e. lowest) possible memory utilization.
|
||||
This however is usually conflicting, so we have to balance the two.
|
||||
|
||||
\numberingOff
|
||||
\inlinedef \bi{Aggregate payload} $P_k$: All \texttt{malloc}'d stuff minus all \texttt{free}'d stuff
|
||||
|
||||
\inlinedef \bi{Current heap size} $H_k$: Monotonically non-decreasing. Grows when \texttt{sbrk} system call is issued.
|
||||
|
||||
\inlinedef \bi{Peak memory utilization} $U_k = (\max_{i < k} P_i) / H_k$
|
||||
|
||||
|
||||
A bit problem for the \texttt{free} function is to know how much memory to free without knowing the size of the to be freed block.
|
||||
This is just one of many other implementation issues:
|
||||
\begin{itemize}
|
||||
\item How do we keep track of the free blocks? I.e. where and how large are they?
|
||||
\item What do we do with the extra space of a block when allocating a smaller block?
|
||||
\item How do we pick a block?
|
||||
\item How do we reinsert a freed block into the heap?
|
||||
\end{itemize}
|
||||
This all leads to an issue known as \bi{fragmentation}
|
||||
|
||||
\inlinedef \bi{Internal Fragmentation}: If for a given block the payload (i.e. the requested size) is smaller than the block size.
|
||||
This depends on the pattern of previous requests and is thus easy to measure
|
||||
|
||||
\inlinedef \bi{External Fragmentation}: There is enough aggregate heap memory, but there isn't a single large enough free block available
|
||||
This depends on the pattern of future requests and is thus hard to measure
|
||||
Reference in New Issue
Block a user