diff --git a/semester3/spca/parts/00_asm/06_unorthodox-control-flow.tex b/semester3/spca/parts/00_asm/06_unorthodox-control-flow.tex new file mode 100644 index 0000000..e69de29 diff --git a/semester3/spca/parts/02_toolchain/00_intro.tex b/semester3/spca/parts/02_toolchain/00_intro.tex index 85ccbfc..905e5b3 100644 --- a/semester3/spca/parts/02_toolchain/00_intro.tex +++ b/semester3/spca/parts/02_toolchain/00_intro.tex @@ -13,10 +13,21 @@ However, the individual parts are usually not called individually, but using the \texttt{gcc} has (as of \texttt{GCC 15.2.1 20260103} on Arch Linux) about 1000 CLI arguments that can be passed. Below is a list of the most important flags that can be passed, as discussed in the lectures: -\begin{tables}{ll}{Flag & Description} - \texttt{-E} & Stop after the preprocessor (output is a \texttt{.i} file) \\ - \texttt{-S} & Stop after the compiler (output is assembly in \texttt{.s} file) \\ - \texttt{-c} & Stop after the assembler (output is \texttt{.o} file) \\ - \texttt{-o} & Specify the executable name \\ - \texttt{-DNDEBUG} & Removes all assert statements \\ -\end{tables} +\begin{table}[h!] + \begin{tables}{ll}{Flag & Description} + \texttt{-E} & Stop after the preprocessor (output is a \texttt{.i} file) \\ + \texttt{-S} & Stop after the compiler (output is assembly in \texttt{.s} file) \\ + \texttt{-c} & Stop after the assembler (output is \texttt{.o} file) \\ + \texttt{-o} & Specify the executable name \\ + \texttt{-DNDEBUG} & Removes all assert statements \\ + \texttt{-OX} & Optimization level where \texttt{X} can be one of \texttt{0, 1, 2, 3} \\ + \texttt{-g} & Compile with debugging information \\ + \texttt{-Wall} & Enable common warnings \\ + \texttt{-Wextra} & Enable more warnings \\ + \texttt{-Werror} & Makes all warnings errors \\ + \texttt{-march=XXX} & Optimize for the architecture (can be e.g. \texttt{native}, \texttt{x86-64-v4}, \dots) \\ + \texttt{-fno-tree-vectorize} & Do not vectorize code (\texttt{-O3} commonly enables vectorization) \\ + \end{tables} + \caption{Command line flags for GCC} + \label{tab:gcc-flags} +\end{table} diff --git a/semester3/spca/parts/02_toolchain/01_compiler-optimizations.tex b/semester3/spca/parts/02_toolchain/01_compiler-optimizations.tex new file mode 100644 index 0000000..4b9f53e --- /dev/null +++ b/semester3/spca/parts/02_toolchain/01_compiler-optimizations.tex @@ -0,0 +1,69 @@ +\newpage +\subsection{Compiler optimizations} +While the compiler can do quite a bit to speed up code, it can't rework the core logic, as it has to guarantee that the executable does do what was specified in the code. + +So, it is really important to not only consider asymptotic runtime (as \texttt{100n} and \texttt{5n} are both $\tco{n}$, but oviously the latter is 20 times faster). +We thus need to optimize the algorithms, data representations, loops, etc and for that, we need to properly understand how programs are compiled, executed and how the hadware works. + +When using \texttt{gcc}, it is usually a good idea to compile a final build with the \texttt{-O2} or \texttt{-O3} flags. + +The \texttt{-march} flag was already mentioned in table \ref{tab:gcc-flags} and can be used if you want to go above and beyond, as it will optimize for the specific hardware. +The values that can be passed to \texttt{-march} are listed \hlhref{https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html}{here} and even include a specific CPU microarchitecture. +For example, to compile for Intel Alderlake (12000 series), you can specify \texttt{-march=alderlake} + +To understand what you need to optimize, you need to understand what the compiler is good at: +\begin{itemize} + \item Register allocation + \item Scheduling (i.e. code selection and ordering) + \item Dead code elimination + \item Eliminating minor (!) inefficiencies +\end{itemize} +and what it is not good at: +\begin{itemize} + \item Improving Asymptotic efficiency (compiler can't turn BubbleSort into e.g. QuickSort) + \item Improving the constant factor (if your implementation is slow, it likely won't magically become faster, though some bad practices can be eliminated) + \item Overcoming other optimization blockers such as memory aliasing and procedure side-effects +\end{itemize} + +\content{Code motion} is a compiler technique, where it moves certain computations out of loops that always produce the same result. +However: Always remember that the compiler will be \bi{conservative}, i.e. it will always err on the side of caution. + +\content{Strength reduction} is a compiler technique, where e.g. sequences of products are turned into cheaper additions in each iteration. +An example is that if you have an operation such as \texttt{n * i} in a loop, +the compiler might replace that with a variable \texttt{ni} that is incremented by \texttt{n} in each iteration. +Similarly, it might replace \texttt{16 * x}, or even worse still, \texttt{x / 16} with \texttt{x << 4} or \texttt{x >> 4}, respectively + +\content{Common sub-expressions} can be extracted into pre-computations and then only use cheaper operations on the individual steps. +A good example is if you are using similar multiplications that then only require one addition or subtraction to get to a result close by. + +\subsubsection{Optimization blockers} +A sure-fire way to make your code slow is by using a large number of procedure calls. +They are among the slowest operations in \lC. +And, the compiler cannot safely extract the function in a for loop like this: +\begin{code}{c} + int i; + for (i = 0; i < strlen(s); i++) { + if (s[i] >= 'A' && s[i] <= 'Z') { + s[i] -= ('A' - 'a'); + } + } +\end{code} +The compiler can't safely remove \texttt{strlen(s)} from the loop, as it may have side-effects, +i.e. may modify other program content other than simply returning a value. +Thus, only ever call functions in the loop condition when you need the side-effects and otherwise, pre-compute it and simply use a variable to check against. +\begin{scriptsize} + You can declare a function \textit{side-effect free} using \verb|__attribute__((pure))| in the function declaration. + The compiler may then extract \texttt{strlen(s)} from the loop. +\end{scriptsize} + +Another common blocker is memory aliasing. This happens when two pointers point to the same address and of course, +since we can do pointer arithmetic, it is very easy to do that in \lC. +The easiest way to prevent this from happening is to use local variables where possible, +such that they do not need to be passed in using a pointer. + +Normally the compiler assumes there can be another pointer that accesses the memory pointed to by this pointer. +If you use the \texttt{restrict} keyword on the variable (i.e. in a function declaration, we have \texttt{void test(double restrict *a)}), +the compiler will assume that for the lifetime of this pointer, there are no other pointers that will be used to access the memory to which it points. + +Another technique to improve throughput for something like matrix multiplications is to do it in blocks due to the way caching works. +Since the compiler doesn't \textit{understand} your code, it can't do this for you (as it assumes associativity of the operation) diff --git a/semester3/spca/parts/02_toolchain/01_linking.tex b/semester3/spca/parts/02_toolchain/02_linking.tex similarity index 100% rename from semester3/spca/parts/02_toolchain/01_linking.tex rename to semester3/spca/parts/02_toolchain/02_linking.tex diff --git a/semester3/spca/parts/02_toolchain/02_file_types.tex b/semester3/spca/parts/02_toolchain/03_file_types.tex similarity index 100% rename from semester3/spca/parts/02_toolchain/02_file_types.tex rename to semester3/spca/parts/02_toolchain/03_file_types.tex diff --git a/semester3/spca/spca-summary.tex b/semester3/spca/spca-summary.tex index 21e9c1b..4720c7c 100644 --- a/semester3/spca/spca-summary.tex +++ b/semester3/spca/spca-summary.tex @@ -82,7 +82,7 @@ If there are changes and you'd like to update this summary, please open a pull r \end{center} -\newsection +\newpage \section{x86 Assembly} \input{parts/00_asm/00_intro.tex} \input{parts/00_asm/01_syntax/00_intro.tex} @@ -103,7 +103,7 @@ If there are changes and you'd like to update this summary, please open a pull r % ── Intro to C ────────────────────────────────────────────────────── -\newsection +\newpage \section{The C Programming Language} \input{parts/01_c/00_intro.tex} \input{parts/01_c/01_basics/00_intro.tex} @@ -123,11 +123,12 @@ If there are changes and you'd like to update this summary, please open a pull r \input{parts/01_c/05_vulnerabilities.tex} -\newsection +\newpage \section{The gcc toolchain} \input{parts/02_toolchain/00_intro.tex} -\input{parts/02_toolchain/01_linking.tex} -\input{parts/02_toolchain/02_file_types.tex} +\input{parts/02_toolchain/01_compiler-optimizations.tex} +\input{parts/02_toolchain/02_linking.tex} +\input{parts/02_toolchain/03_file_types.tex} % ── Hardware recap ──────────────────────────────────────────────────