[SPCA] Restructuring, finish memory management in C, start dynamic memory management section

2026-03-14 17:00:05 +01:00 · 2026-01-07 11:54:37 +01:00
parent 2afa0ff161
commit 2c9921c6d1
23 changed files with 248 additions and 66 deletions
--- a/semester3/spca/parts/01_c/01_basics/00_intro.tex
+++ b/semester3/spca/parts/01_c/01_basics/00_intro.tex
@@ -0,0 +1,12 @@
+\subsection{Basics}
+\texttt{C} uses a very similar syntax as many other programming languages, like \texttt{Java}, \texttt{JavaScript} and many more\dots
+to be precise, it is \textit{them} that use the \texttt{C} syntax, not the other way around. So:
+\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{00_intro.c}
+
+In \texttt{C} we are referring to the implementation of a function as a \bi{(function) definition} (correspondingly, \textit{variable definition}, if the variable is initialized)
+and to the definition of the function signature (or variables, without initializing them) as the \bi{(function) declaration} (or, correspondingly, \textit{variable declaration}).
+
+\texttt{C} code is usuallt split into the source files, ending in \texttt{.c} (where the local functions and variables are declared, as well as all function definitions)
+and the header files, ending in \texttt{.h}, usually sharing the filename of the source file, where the external declarations are defined.
+By convention, no definition of functions are in the \texttt{.h} files, and neither variables, but there is nothing preventing you from putting them there.
+\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{01_func.h}
--- a/semester3/spca/parts/01_c/01_basics/01_control-flow.tex
+++ b/semester3/spca/parts/01_c/01_basics/01_control-flow.tex
@@ -0,0 +1,14 @@
+\newpage
+\subsubsection{Control Flow}
+Many of the control-flow structures of \texttt{C} can be found in the below code snippet.
+A note of caution when using goto: It is almost never a good idea (can lead to unexpected behaviour, is hard to maintain, etc).
+Where it however is very handy is for error recovery (and cleanup functions) and early termination of multiple loops (jumping out of a loop).
+So, for example, if you have to run multiple functions to set something up and one of them fails,
+you can jump to a label and have all cleanup code execute that you have specified there.
+And because the labels are (as in Assembly) simply skipped over during execution, you can make very nice cleanup code.
+We can also use \texttt{continue} and \texttt{break} statements similarly to \texttt{Java}, they do not however accept labels.
+(Reminder: \texttt{continue} skips the loop body and goes to the next iteration)
+
+\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{01_func.c}
+
+
--- a/semester3/spca/parts/01_c/01_basics/02_declarations.tex
+++ b/semester3/spca/parts/01_c/01_basics/02_declarations.tex
@@ -0,0 +1,86 @@
+\newpage
+\subsubsection{Declarations}
+We have already seen a few examples for how \texttt{C} handles declarations.
+In concept they are similar (and scoping works the same) to most other \texttt{C}-like programming languages, including \texttt{Java}.
+
+\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{02_declarations.c}
+
+\newpage
+A peculiarity of \texttt{C} is that the bit-count is not defined by the language, but rather the hardware it is compiled for.
+\rmvspace
+
+\begin{fullTable}{llll}{\texttt{C} data type & typical 32-bit & ia32  & x86-64}{Comparison of byte-sizes for each datatype on different architectures}
+    \texttt{char}               & 1              & 1     & 1       \\
+    \texttt{short}              & 2              & 2     & 2       \\
+    \texttt{int}                & 4              & 4     & 4       \\
+    \texttt{long}               & 4              & 4     & 8       \\
+    \texttt{long long}          & 8              & 8     & 8       \\
+    \texttt{float}              & 4              & 4     & 4       \\
+    \texttt{double}             & 4              & 8     & 8       \\
+    \texttt{long double}        & 8              & 10/12 & 16      \\
+\end{fullTable}
+
+\drmvspace
+\warn{Type format} Be however aware that this table uses the \texttt{LP64} format for the x86-64 sizes
+and this is the format all UNIX-Systems use (i.e. Linux, BSD, Darwin (the Mac Kernel)).
+64 bit Windows however uses \texttt{LLP64}, i.e. \texttt{int} and \texttt{long} have the same size (32) and \texttt{long long} and pointers are 64 bit.
+
+
+\content{Integers} By default, integers in \lC\ are \texttt{signed}, to declare an unsigned integer, use \texttt{unsigned int}.
+Since it is hard and annoying to remember the number of bytes that are in each data type, \texttt{C99} has introduced the extended integer types,
+which can be imported from \texttt{stdint.h} and are of form \texttt{int<bit count>\_t} and \texttt{uint<bit count>\_t},
+where we substitute the \texttt{<bit count>} with the number of bits (have to correspond to a valid type of course).
+
+
+\content{Booleans} Another notable difference of \texttt{C} compared to other languages is that \texttt{C} doesn't natively have a \texttt{boolean} type,
+by convention a \texttt{short} is used to represent it, where any non-zero value means \texttt{true} and \texttt{0} means \texttt{false}.
+Since boolean types are quite handy, the \texttt{!} syntax for negation turns any non-zero value of any integer type into zero and vice-versa.
+\texttt{C99} has added support for a bool type via \texttt{stdbool.h}, which however is still an integer.
+
+
+\content{Implicit casts} Notably, \texttt{C} doesn't have a very rigid type system and lower bit-count types are implicitly cast to higher bit-count data types, i.e.
+if you add a \texttt{short} and an \texttt{int}, the \texttt{short} is cast to \texttt{short} (bits 16-31 are set to $0$) and the two are added.
+Explicit casting between almost all types is also supported.
+Some will force a change of bit representation, but most won't (notably, when casting to and from \texttt{float}-like types, minus to \texttt{void})
+
+
+\content{Expressions} Every \lC\ statement is also an expression, see above code block for example.
+
+
+\content{Void} The \texttt{void} type has \bi{no} value and is used for untyped pointers and declaring functions with no return value
+
+
+\content{Structs} Are like classes in OOP, but they contain no logic.
+We can assign copy a struct by assignment and they behave just like everything else in \texttt{C} when used as an argument for functions
+in that they are passed by value and not by reference.
+You can of course pass it also by reference (like any other data type) by setting the argument to type \texttt{struct mystruct * name} and then calling the function using
+\texttt{func(\&test)} assuming \texttt{test} is the name of your struct
+
+
+\content{Typedef} To define a custom type using \texttt{typedef <type it represents> <name of the new type>}.
+
+You may also use \texttt{typedef} on structs using \texttt{typedef struct <struct tag> <name of the new alias>},
+you can thus instead of e.g. \verb|struct list_el my_list;| write \verb|list my_list;|, if you have used \verb|typedef struct list_el list;| before.
+It is even possible to do this:
+\drmvspace
+\begin{code}{c}
+    typedef struct list_el {
+        unsigned long val;
+        struct list_el *next;
+    } list_el;
+
+    struct list_el my_list;
+    list_el my_other_list;
+\end{code}
+\rmvspace
+
+\content{Namespaces}
+\lC\ has a few different namespaces, i.e. you can have the one of the same name in each namespace (i.e. you can have \texttt{struct a}, \texttt{int a}, etc).
+The following namespaces were covered:
+\rmvspace
+\begin{itemize}[noitemsep]
+    \item Label names (used for \texttt{goto})
+    \item Tags (for \texttt{struct}, \texttt{union} and \texttt{enum})
+    \item Member names one namespace for each \texttt{struct}, \texttt{union} and \texttt{enum}
+    \item Everything else mostly (types, variable names, etc, including typedef)
+\end{itemize}
--- a/semester3/spca/parts/01_c/01_basics/03_operators.tex
+++ b/semester3/spca/parts/01_c/01_basics/03_operators.tex
@@ -0,0 +1,46 @@
+\newpage
+\subsubsection{Operators}
+The list of operators in \lC\ is similar to the one of \texttt{Java}, etc.
+In Table \ref{tab:c-operators}, you can see an overview of the operators, sorted by precedence in descending order.
+You may notice that the \verb|&| and \verb|*| operators appear twice. The higher precedence occurrence is the address operator and dereference, respectively,
+and the lower precedence is \texttt{bitwise and} and \texttt{multiplication}, respectively.
+
+Very low precedence belongs to boolean operators \verb|&&| and \texttt{||}, as well as the ternary operator and assignment operators
+\begin{table}[h!]
+    \begin{tables}{ll}{Operator                                  & Associativity}
+              \texttt{() [] -> .}                            & Left-to-right                \\
+              \verb|! ~ ++ -- + - * & (type) sizeof|         & Right-to-left  \\
+              \verb|* / %|                                   & Left-to-right  \\
+              \verb|+ -|                                     & Left-to-right  \\
+              \verb|<< >>|                                   & Left-to-right  \\
+              \verb|< <= >= >|                               & Left-to-right  \\
+              \verb|== !=|                                   & Left-to-right  \\
+              \verb|&| (logical and)                         & Left-to-right  \\
+              \verb|^| (logical xor)                         & Left-to-right  \\
+              \texttt{|} (logical or)                        & Left-to-right                \\
+              \verb|&&| (boolean and)                        & Left-to-right  \\
+              \texttt{||} (boolean or)                       & Left-to-right                \\
+              \texttt{? :} (ternary)                         & Right-to-left                \\
+              \verb|= += -= *= /= %= &= ^=||\verb|= <<= >>=| & Right-to-left  \\
+              \verb|,|                                       & Left-to-right  \\
+    \end{tables}
+    \caption{\lC\ operators ordered in descending order by precedence}
+    \label{tab:c-operators}
+\end{table}
+
+\shade{blue}{Associativity} 
+\begin{itemize}
+    \item Left-to-right: $A + B + C \mapsto (A + B) + C$
+    \item Right-to-left: \texttt{A += B += C} $\mapsto$ \texttt{(A += B) += C}
+\end{itemize}
+
+As it should be, boolean and, as well as boolean or support early termination.
+
+The ternary operator works as in other programming languages \verb|result = expr ? res_true : res_false;|
+
+As previously touched on, every statement is also an expression, i.e. the following works
+\mint{c}|printf("%s", x = foo(y)); // prints output of foo(y) and x has that value|
+
+Pre-increment (\texttt{++i}, new value returned) and post-increment (\texttt{i++}, old value returned) are also supported by \lC.
+
+\lC\ has an \texttt{assert} statement, but do not use it for error handling. The basic syntax is \texttt{assert( expr );}
--- a/semester3/spca/parts/01_c/01_basics/04_arrays.tex
+++ b/semester3/spca/parts/01_c/01_basics/04_arrays.tex
@@ -0,0 +1,8 @@
+\newpage
+\subsubsection{Arrays}
+\label{sec:c-arrays}
+\lC\ compiler does not do any array bound checks! Thus, always check array bounds.
+Unlike some other programming languages, arrays are \bi{not} dynamic length.
+
+The below snippet includes already some pointer arithmetic tricks. The variable \texttt{data} is a pointer to the first element of the array.
+\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{03_arrays.c}
--- a/semester3/spca/parts/01_c/01_basics/05_strings.tex
+++ b/semester3/spca/parts/01_c/01_basics/05_strings.tex
@@ -0,0 +1,6 @@
+\subsubsection{Strings}
+\lC\ doesn't have a \texttt{string} data type, but rather, strings are represented (when using \texttt{ASCII}) as \texttt{char} arrays,
+with length of the array $n + 1$ (where $n$ is the number of characters of the string).
+The extra element is the termination character, called the \texttt{null character}, denoted \verb|\0|.
+To determine the actual length of the string (as it may be padded), we can use \verb|strnlen(str, maxlen)| from \texttt{string.h}
+\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{04_strings.c}
--- a/semester3/spca/parts/01_c/01_basics/06_integers.tex
+++ b/semester3/spca/parts/01_c/01_basics/06_integers.tex
@@ -0,0 +1,39 @@
+\subsubsection{Integers in C}
+As a reminder, integers are encoded as follows in big endian notation, with $x_i$ being the $i$-th bit and $w$ being the number of bits used to represent the number:
+\begin{itemize}[noitemsep]
+    \item \bi{Unsigned}: $\displaystyle \sum_{i = 0}^{w - 1} x_i \cdot 2^i$
+    \item \bi{Signed}: $\displaystyle -x_{w - 1} \cdot 2^{w - 1} + \sum_{i = 0}^{w - 1} x_i \cdot 2^i$ (two's complement notation, with $x_{w - 1}$ being the sign-bit)
+\end{itemize}
+The minimum number representable is $0$ and $-2^{w - 1}$, respectively, whereas the maximum number representable is $2^w - 1$ and $2^{w - 1} - 1$.
+\verb|limits.h| defines constants for the minimum and maximum values of different types, e.g. \verb|ULONG_MAX| or \verb|LONG_MAX| and \verb|LONG_MIN|
+
+We can use the shift operators to multiply and divide by two. Shift operations are usually \textit{much} cheaper than multiplication and division.
+Left shift (\texttt{u << k} in \lC) always fills with zeros and throws away the extra bits on the left (equivalent to multiplication by $2^k$),
+whereas right shift (\texttt{u >> k} in \lC) is implementation-defined,
+either arithmetic (fill with most significant bit, division by $2^k$. This however rounds incorrectly, see below)
+or logical shift (fill with zeros, unsigned division by $2^k$).
+
+Signed division using arithmetic right shifts has the issue of incorrect rounding when number is $< 0$.
+Instead, we represent $s / 2^k = s + (2^k - 1) \texttt{ >> } k$ for $s < 0$ and $s / 2^k = s >> k$ for $s > 0$
+
+\bi{In expressions, signed values are implicitly cast to unsigned}
+
+This can lead to all sorts of nasty exploits (e.g. provide $-1$ as the argument to \texttt{memcpy} and watch it burn, this was an actual exploit in FreeBSD)
+
+\fhlc{Cyan}{Addition \& Subtraction}
+
+A nice property of the two's complement notation is that addition and subtraction works exactly the same as in normal notation, due to over- and underflow.
+This also obviously means that it implements modular arithmetic, i.e.
+\mrmvspace
+\begin{align*}
+    \texttt{Add}_w (u, v) = u + v \text{ mod } 2^w \ \text{ and } \ \texttt{Sub}_w (u, v) = u - v \text{ mod } 2^w
+\end{align*}
+
+\mrmvspace
+\fhlc{Cyan}{Multiplication \& Division}
+
+Unsigned multiplication with addition forms a commutative ring.
+Again, it is doing modular arithmetic and
+\begin{align*}
+    \texttt{UMult}_w (u, v) = u \cdot v \text{ mod } 2^w
+\end{align*}
--- a/semester3/spca/parts/01_c/01_basics/07_pointers.tex
+++ b/semester3/spca/parts/01_c/01_basics/07_pointers.tex
@@ -0,0 +1,53 @@
+\newpage
+\subsubsection{Pointers}
+On loading of a program, the OS creates the virtual address space for the process, inspects the executable and loads the data to the right places in the address space,
+before other preparations like final linking and relocation are done.
+
+Stack-based languages (supporting recursion) allocate stack in frames that contain local variables, return information and temporary space.
+When a procedure is entered, a stack frame is allocated and executes any necessary setup code (like moving the stack pointer, see later). % TODO: Link to correct section
+When a procedure returns, the stack frame is deallocated and any necessary cleanup code is executed, before execution of the previous frame continues.
+
+\bi{In \lC\ a pointer is a variable whose value is the memory address of another variable}
+
+Of note is that if you simply declare a pointer using \texttt{type * p;} you will get different memory addresses every time.
+The (Linux)-Kernel randomizes the address space to prevent some common exploits.
+\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{05_pointers.c}
+
+\newpage
+\begin{scriptsize}
+    Some pointer arithmetic has already appeared in section \ref{sec:c-arrays}, but same kind of content with better explanation can be found here
+\end{scriptsize}
+
+\content{Pointer Arithmetic} Note that when doing pointer arithmetic, adding $1$ will move the pointer by \texttt{sizeof(type)} bits.
+
+You may use pointer arithmetic on whatever pointer you'd like (as long as it's not a null pointer).
+This means, you \textit{can} make an array wherever in memory you'd like.
+The issue is just that you are likely to overwrite something, and that something might be something critical (like a stack pointer),
+thus you will get \bi{undefined} behaviour! (This is by the way a common concept in \lC, if something isn't easy to make more flexible
+(example for \texttt{malloc}, if you pass a pointer to memory that is not the start of the \texttt{malloc}'d section, you get undefined behaviour),
+in the docs mention that one gets undefined behaviour if you do not do as it says so\dots RTFM!)
+
+As already seen in the section arrays (section \ref{sec:c-arrays}), we can use pointer arithmetic for accessing array elements.
+The array name is treated as a pointer to the first element of the array, except when:
+\begin{itemize}[noitemsep]
+    \item it is operand of \texttt{sizeof} (return value is $n \cdot \texttt{sizeof(type)}$ with $n$ the number of elements)
+    \item its address is taken (then \texttt{\&a == a})
+    \item it is a string literal initializer. If we modify a pointer \texttt{char *b = "String";} to string literal in code,
+          the \texttt{"String"} is stored in the code segment and if we modify the pointer, we get undefined behaviour
+\end{itemize}
+\shade{purple}{Fun fact}: \texttt{A[i]} is always rewritten \texttt{*(A + i)} by compiler.
+
+\content{Function arguments} Another important aspect is passing by value or by reference.
+You can pass every data type by reference, you can not however pass an array by value (as an array is treated as a pointer, see above).
+
+\content{Body-less loops}
+\rmvspace
+\begin{code}{c}
+    int x = 0;
+    while ( x++ < 10 ); // This is (of course) not a useful snippet, but shows the concept
+\end{code}
+
+\content{Function pointers}
+A function can be passed as an argument to another function using the typical address syntax with the \verb|&| symbol is annotated as argument using
+\verb|type (* name)(type arg1, ...)|
+and is called using \verb|(*func)(arg1, ...)|.