mirror of
https://github.com/janishutz/eth-summaries.git
synced 2026-01-13 02:38:25 +00:00
[SPCA] Restructuring, finish memory management in C, start dynamic memory management section
This commit is contained in:
12
semester3/spca/parts/01_c/01_basics/00_intro.tex
Normal file
12
semester3/spca/parts/01_c/01_basics/00_intro.tex
Normal file
@@ -0,0 +1,12 @@
|
||||
\subsection{Basics}
|
||||
\texttt{C} uses a very similar syntax as many other programming languages, like \texttt{Java}, \texttt{JavaScript} and many more\dots
|
||||
to be precise, it is \textit{them} that use the \texttt{C} syntax, not the other way around. So:
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{00_intro.c}
|
||||
|
||||
In \texttt{C} we are referring to the implementation of a function as a \bi{(function) definition} (correspondingly, \textit{variable definition}, if the variable is initialized)
|
||||
and to the definition of the function signature (or variables, without initializing them) as the \bi{(function) declaration} (or, correspondingly, \textit{variable declaration}).
|
||||
|
||||
\texttt{C} code is usuallt split into the source files, ending in \texttt{.c} (where the local functions and variables are declared, as well as all function definitions)
|
||||
and the header files, ending in \texttt{.h}, usually sharing the filename of the source file, where the external declarations are defined.
|
||||
By convention, no definition of functions are in the \texttt{.h} files, and neither variables, but there is nothing preventing you from putting them there.
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{01_func.h}
|
||||
14
semester3/spca/parts/01_c/01_basics/01_control-flow.tex
Normal file
14
semester3/spca/parts/01_c/01_basics/01_control-flow.tex
Normal file
@@ -0,0 +1,14 @@
|
||||
\newpage
|
||||
\subsubsection{Control Flow}
|
||||
Many of the control-flow structures of \texttt{C} can be found in the below code snippet.
|
||||
A note of caution when using goto: It is almost never a good idea (can lead to unexpected behaviour, is hard to maintain, etc).
|
||||
Where it however is very handy is for error recovery (and cleanup functions) and early termination of multiple loops (jumping out of a loop).
|
||||
So, for example, if you have to run multiple functions to set something up and one of them fails,
|
||||
you can jump to a label and have all cleanup code execute that you have specified there.
|
||||
And because the labels are (as in Assembly) simply skipped over during execution, you can make very nice cleanup code.
|
||||
We can also use \texttt{continue} and \texttt{break} statements similarly to \texttt{Java}, they do not however accept labels.
|
||||
(Reminder: \texttt{continue} skips the loop body and goes to the next iteration)
|
||||
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{01_func.c}
|
||||
|
||||
|
||||
86
semester3/spca/parts/01_c/01_basics/02_declarations.tex
Normal file
86
semester3/spca/parts/01_c/01_basics/02_declarations.tex
Normal file
@@ -0,0 +1,86 @@
|
||||
\newpage
|
||||
\subsubsection{Declarations}
|
||||
We have already seen a few examples for how \texttt{C} handles declarations.
|
||||
In concept they are similar (and scoping works the same) to most other \texttt{C}-like programming languages, including \texttt{Java}.
|
||||
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{02_declarations.c}
|
||||
|
||||
\newpage
|
||||
A peculiarity of \texttt{C} is that the bit-count is not defined by the language, but rather the hardware it is compiled for.
|
||||
\rmvspace
|
||||
|
||||
\begin{fullTable}{llll}{\texttt{C} data type & typical 32-bit & ia32 & x86-64}{Comparison of byte-sizes for each datatype on different architectures}
|
||||
\texttt{char} & 1 & 1 & 1 \\
|
||||
\texttt{short} & 2 & 2 & 2 \\
|
||||
\texttt{int} & 4 & 4 & 4 \\
|
||||
\texttt{long} & 4 & 4 & 8 \\
|
||||
\texttt{long long} & 8 & 8 & 8 \\
|
||||
\texttt{float} & 4 & 4 & 4 \\
|
||||
\texttt{double} & 4 & 8 & 8 \\
|
||||
\texttt{long double} & 8 & 10/12 & 16 \\
|
||||
\end{fullTable}
|
||||
|
||||
\drmvspace
|
||||
\warn{Type format} Be however aware that this table uses the \texttt{LP64} format for the x86-64 sizes
|
||||
and this is the format all UNIX-Systems use (i.e. Linux, BSD, Darwin (the Mac Kernel)).
|
||||
64 bit Windows however uses \texttt{LLP64}, i.e. \texttt{int} and \texttt{long} have the same size (32) and \texttt{long long} and pointers are 64 bit.
|
||||
|
||||
|
||||
\content{Integers} By default, integers in \lC\ are \texttt{signed}, to declare an unsigned integer, use \texttt{unsigned int}.
|
||||
Since it is hard and annoying to remember the number of bytes that are in each data type, \texttt{C99} has introduced the extended integer types,
|
||||
which can be imported from \texttt{stdint.h} and are of form \texttt{int<bit count>\_t} and \texttt{uint<bit count>\_t},
|
||||
where we substitute the \texttt{<bit count>} with the number of bits (have to correspond to a valid type of course).
|
||||
|
||||
|
||||
\content{Booleans} Another notable difference of \texttt{C} compared to other languages is that \texttt{C} doesn't natively have a \texttt{boolean} type,
|
||||
by convention a \texttt{short} is used to represent it, where any non-zero value means \texttt{true} and \texttt{0} means \texttt{false}.
|
||||
Since boolean types are quite handy, the \texttt{!} syntax for negation turns any non-zero value of any integer type into zero and vice-versa.
|
||||
\texttt{C99} has added support for a bool type via \texttt{stdbool.h}, which however is still an integer.
|
||||
|
||||
|
||||
\content{Implicit casts} Notably, \texttt{C} doesn't have a very rigid type system and lower bit-count types are implicitly cast to higher bit-count data types, i.e.
|
||||
if you add a \texttt{short} and an \texttt{int}, the \texttt{short} is cast to \texttt{short} (bits 16-31 are set to $0$) and the two are added.
|
||||
Explicit casting between almost all types is also supported.
|
||||
Some will force a change of bit representation, but most won't (notably, when casting to and from \texttt{float}-like types, minus to \texttt{void})
|
||||
|
||||
|
||||
\content{Expressions} Every \lC\ statement is also an expression, see above code block for example.
|
||||
|
||||
|
||||
\content{Void} The \texttt{void} type has \bi{no} value and is used for untyped pointers and declaring functions with no return value
|
||||
|
||||
|
||||
\content{Structs} Are like classes in OOP, but they contain no logic.
|
||||
We can assign copy a struct by assignment and they behave just like everything else in \texttt{C} when used as an argument for functions
|
||||
in that they are passed by value and not by reference.
|
||||
You can of course pass it also by reference (like any other data type) by setting the argument to type \texttt{struct mystruct * name} and then calling the function using
|
||||
\texttt{func(\&test)} assuming \texttt{test} is the name of your struct
|
||||
|
||||
|
||||
\content{Typedef} To define a custom type using \texttt{typedef <type it represents> <name of the new type>}.
|
||||
|
||||
You may also use \texttt{typedef} on structs using \texttt{typedef struct <struct tag> <name of the new alias>},
|
||||
you can thus instead of e.g. \verb|struct list_el my_list;| write \verb|list my_list;|, if you have used \verb|typedef struct list_el list;| before.
|
||||
It is even possible to do this:
|
||||
\drmvspace
|
||||
\begin{code}{c}
|
||||
typedef struct list_el {
|
||||
unsigned long val;
|
||||
struct list_el *next;
|
||||
} list_el;
|
||||
|
||||
struct list_el my_list;
|
||||
list_el my_other_list;
|
||||
\end{code}
|
||||
\rmvspace
|
||||
|
||||
\content{Namespaces}
|
||||
\lC\ has a few different namespaces, i.e. you can have the one of the same name in each namespace (i.e. you can have \texttt{struct a}, \texttt{int a}, etc).
|
||||
The following namespaces were covered:
|
||||
\rmvspace
|
||||
\begin{itemize}[noitemsep]
|
||||
\item Label names (used for \texttt{goto})
|
||||
\item Tags (for \texttt{struct}, \texttt{union} and \texttt{enum})
|
||||
\item Member names one namespace for each \texttt{struct}, \texttt{union} and \texttt{enum}
|
||||
\item Everything else mostly (types, variable names, etc, including typedef)
|
||||
\end{itemize}
|
||||
46
semester3/spca/parts/01_c/01_basics/03_operators.tex
Normal file
46
semester3/spca/parts/01_c/01_basics/03_operators.tex
Normal file
@@ -0,0 +1,46 @@
|
||||
\newpage
|
||||
\subsubsection{Operators}
|
||||
The list of operators in \lC\ is similar to the one of \texttt{Java}, etc.
|
||||
In Table \ref{tab:c-operators}, you can see an overview of the operators, sorted by precedence in descending order.
|
||||
You may notice that the \verb|&| and \verb|*| operators appear twice. The higher precedence occurrence is the address operator and dereference, respectively,
|
||||
and the lower precedence is \texttt{bitwise and} and \texttt{multiplication}, respectively.
|
||||
|
||||
Very low precedence belongs to boolean operators \verb|&&| and \texttt{||}, as well as the ternary operator and assignment operators
|
||||
\begin{table}[h!]
|
||||
\begin{tables}{ll}{Operator & Associativity}
|
||||
\texttt{() [] -> .} & Left-to-right \\
|
||||
\verb|! ~ ++ -- + - * & (type) sizeof| & Right-to-left \\
|
||||
\verb|* / %| & Left-to-right \\
|
||||
\verb|+ -| & Left-to-right \\
|
||||
\verb|<< >>| & Left-to-right \\
|
||||
\verb|< <= >= >| & Left-to-right \\
|
||||
\verb|== !=| & Left-to-right \\
|
||||
\verb|&| (logical and) & Left-to-right \\
|
||||
\verb|^| (logical xor) & Left-to-right \\
|
||||
\texttt{|} (logical or) & Left-to-right \\
|
||||
\verb|&&| (boolean and) & Left-to-right \\
|
||||
\texttt{||} (boolean or) & Left-to-right \\
|
||||
\texttt{? :} (ternary) & Right-to-left \\
|
||||
\verb|= += -= *= /= %= &= ^=||\verb|= <<= >>=| & Right-to-left \\
|
||||
\verb|,| & Left-to-right \\
|
||||
\end{tables}
|
||||
\caption{\lC\ operators ordered in descending order by precedence}
|
||||
\label{tab:c-operators}
|
||||
\end{table}
|
||||
|
||||
\shade{blue}{Associativity}
|
||||
\begin{itemize}
|
||||
\item Left-to-right: $A + B + C \mapsto (A + B) + C$
|
||||
\item Right-to-left: \texttt{A += B += C} $\mapsto$ \texttt{(A += B) += C}
|
||||
\end{itemize}
|
||||
|
||||
As it should be, boolean and, as well as boolean or support early termination.
|
||||
|
||||
The ternary operator works as in other programming languages \verb|result = expr ? res_true : res_false;|
|
||||
|
||||
As previously touched on, every statement is also an expression, i.e. the following works
|
||||
\mint{c}|printf("%s", x = foo(y)); // prints output of foo(y) and x has that value|
|
||||
|
||||
Pre-increment (\texttt{++i}, new value returned) and post-increment (\texttt{i++}, old value returned) are also supported by \lC.
|
||||
|
||||
\lC\ has an \texttt{assert} statement, but do not use it for error handling. The basic syntax is \texttt{assert( expr );}
|
||||
8
semester3/spca/parts/01_c/01_basics/04_arrays.tex
Normal file
8
semester3/spca/parts/01_c/01_basics/04_arrays.tex
Normal file
@@ -0,0 +1,8 @@
|
||||
\newpage
|
||||
\subsubsection{Arrays}
|
||||
\label{sec:c-arrays}
|
||||
\lC\ compiler does not do any array bound checks! Thus, always check array bounds.
|
||||
Unlike some other programming languages, arrays are \bi{not} dynamic length.
|
||||
|
||||
The below snippet includes already some pointer arithmetic tricks. The variable \texttt{data} is a pointer to the first element of the array.
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{03_arrays.c}
|
||||
6
semester3/spca/parts/01_c/01_basics/05_strings.tex
Normal file
6
semester3/spca/parts/01_c/01_basics/05_strings.tex
Normal file
@@ -0,0 +1,6 @@
|
||||
\subsubsection{Strings}
|
||||
\lC\ doesn't have a \texttt{string} data type, but rather, strings are represented (when using \texttt{ASCII}) as \texttt{char} arrays,
|
||||
with length of the array $n + 1$ (where $n$ is the number of characters of the string).
|
||||
The extra element is the termination character, called the \texttt{null character}, denoted \verb|\0|.
|
||||
To determine the actual length of the string (as it may be padded), we can use \verb|strnlen(str, maxlen)| from \texttt{string.h}
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{04_strings.c}
|
||||
39
semester3/spca/parts/01_c/01_basics/06_integers.tex
Normal file
39
semester3/spca/parts/01_c/01_basics/06_integers.tex
Normal file
@@ -0,0 +1,39 @@
|
||||
\subsubsection{Integers in C}
|
||||
As a reminder, integers are encoded as follows in big endian notation, with $x_i$ being the $i$-th bit and $w$ being the number of bits used to represent the number:
|
||||
\begin{itemize}[noitemsep]
|
||||
\item \bi{Unsigned}: $\displaystyle \sum_{i = 0}^{w - 1} x_i \cdot 2^i$
|
||||
\item \bi{Signed}: $\displaystyle -x_{w - 1} \cdot 2^{w - 1} + \sum_{i = 0}^{w - 1} x_i \cdot 2^i$ (two's complement notation, with $x_{w - 1}$ being the sign-bit)
|
||||
\end{itemize}
|
||||
The minimum number representable is $0$ and $-2^{w - 1}$, respectively, whereas the maximum number representable is $2^w - 1$ and $2^{w - 1} - 1$.
|
||||
\verb|limits.h| defines constants for the minimum and maximum values of different types, e.g. \verb|ULONG_MAX| or \verb|LONG_MAX| and \verb|LONG_MIN|
|
||||
|
||||
We can use the shift operators to multiply and divide by two. Shift operations are usually \textit{much} cheaper than multiplication and division.
|
||||
Left shift (\texttt{u << k} in \lC) always fills with zeros and throws away the extra bits on the left (equivalent to multiplication by $2^k$),
|
||||
whereas right shift (\texttt{u >> k} in \lC) is implementation-defined,
|
||||
either arithmetic (fill with most significant bit, division by $2^k$. This however rounds incorrectly, see below)
|
||||
or logical shift (fill with zeros, unsigned division by $2^k$).
|
||||
|
||||
Signed division using arithmetic right shifts has the issue of incorrect rounding when number is $< 0$.
|
||||
Instead, we represent $s / 2^k = s + (2^k - 1) \texttt{ >> } k$ for $s < 0$ and $s / 2^k = s >> k$ for $s > 0$
|
||||
|
||||
\bi{In expressions, signed values are implicitly cast to unsigned}
|
||||
|
||||
This can lead to all sorts of nasty exploits (e.g. provide $-1$ as the argument to \texttt{memcpy} and watch it burn, this was an actual exploit in FreeBSD)
|
||||
|
||||
\fhlc{Cyan}{Addition \& Subtraction}
|
||||
|
||||
A nice property of the two's complement notation is that addition and subtraction works exactly the same as in normal notation, due to over- and underflow.
|
||||
This also obviously means that it implements modular arithmetic, i.e.
|
||||
\mrmvspace
|
||||
\begin{align*}
|
||||
\texttt{Add}_w (u, v) = u + v \text{ mod } 2^w \ \text{ and } \ \texttt{Sub}_w (u, v) = u - v \text{ mod } 2^w
|
||||
\end{align*}
|
||||
|
||||
\mrmvspace
|
||||
\fhlc{Cyan}{Multiplication \& Division}
|
||||
|
||||
Unsigned multiplication with addition forms a commutative ring.
|
||||
Again, it is doing modular arithmetic and
|
||||
\begin{align*}
|
||||
\texttt{UMult}_w (u, v) = u \cdot v \text{ mod } 2^w
|
||||
\end{align*}
|
||||
53
semester3/spca/parts/01_c/01_basics/07_pointers.tex
Normal file
53
semester3/spca/parts/01_c/01_basics/07_pointers.tex
Normal file
@@ -0,0 +1,53 @@
|
||||
\newpage
|
||||
\subsubsection{Pointers}
|
||||
On loading of a program, the OS creates the virtual address space for the process, inspects the executable and loads the data to the right places in the address space,
|
||||
before other preparations like final linking and relocation are done.
|
||||
|
||||
Stack-based languages (supporting recursion) allocate stack in frames that contain local variables, return information and temporary space.
|
||||
When a procedure is entered, a stack frame is allocated and executes any necessary setup code (like moving the stack pointer, see later). % TODO: Link to correct section
|
||||
When a procedure returns, the stack frame is deallocated and any necessary cleanup code is executed, before execution of the previous frame continues.
|
||||
|
||||
\bi{In \lC\ a pointer is a variable whose value is the memory address of another variable}
|
||||
|
||||
Of note is that if you simply declare a pointer using \texttt{type * p;} you will get different memory addresses every time.
|
||||
The (Linux)-Kernel randomizes the address space to prevent some common exploits.
|
||||
\inputcodewithfilename{c}{code-examples/00_c/00_basics/}{05_pointers.c}
|
||||
|
||||
\newpage
|
||||
\begin{scriptsize}
|
||||
Some pointer arithmetic has already appeared in section \ref{sec:c-arrays}, but same kind of content with better explanation can be found here
|
||||
\end{scriptsize}
|
||||
|
||||
\content{Pointer Arithmetic} Note that when doing pointer arithmetic, adding $1$ will move the pointer by \texttt{sizeof(type)} bits.
|
||||
|
||||
You may use pointer arithmetic on whatever pointer you'd like (as long as it's not a null pointer).
|
||||
This means, you \textit{can} make an array wherever in memory you'd like.
|
||||
The issue is just that you are likely to overwrite something, and that something might be something critical (like a stack pointer),
|
||||
thus you will get \bi{undefined} behaviour! (This is by the way a common concept in \lC, if something isn't easy to make more flexible
|
||||
(example for \texttt{malloc}, if you pass a pointer to memory that is not the start of the \texttt{malloc}'d section, you get undefined behaviour),
|
||||
in the docs mention that one gets undefined behaviour if you do not do as it says so\dots RTFM!)
|
||||
|
||||
As already seen in the section arrays (section \ref{sec:c-arrays}), we can use pointer arithmetic for accessing array elements.
|
||||
The array name is treated as a pointer to the first element of the array, except when:
|
||||
\begin{itemize}[noitemsep]
|
||||
\item it is operand of \texttt{sizeof} (return value is $n \cdot \texttt{sizeof(type)}$ with $n$ the number of elements)
|
||||
\item its address is taken (then \texttt{\&a == a})
|
||||
\item it is a string literal initializer. If we modify a pointer \texttt{char *b = "String";} to string literal in code,
|
||||
the \texttt{"String"} is stored in the code segment and if we modify the pointer, we get undefined behaviour
|
||||
\end{itemize}
|
||||
\shade{purple}{Fun fact}: \texttt{A[i]} is always rewritten \texttt{*(A + i)} by compiler.
|
||||
|
||||
\content{Function arguments} Another important aspect is passing by value or by reference.
|
||||
You can pass every data type by reference, you can not however pass an array by value (as an array is treated as a pointer, see above).
|
||||
|
||||
\content{Body-less loops}
|
||||
\rmvspace
|
||||
\begin{code}{c}
|
||||
int x = 0;
|
||||
while ( x++ < 10 ); // This is (of course) not a useful snippet, but shows the concept
|
||||
\end{code}
|
||||
|
||||
\content{Function pointers}
|
||||
A function can be passed as an argument to another function using the typical address syntax with the \verb|&| symbol is annotated as argument using
|
||||
\verb|type (* name)(type arg1, ...)|
|
||||
and is called using \verb|(*func)(arg1, ...)|.
|
||||
Reference in New Issue
Block a user