[SPCA] CPP toolchain, prepare unorthodox control flow slides summary

This commit is contained in:
2026-01-15 16:33:05 +01:00
parent f45cd52de5
commit d874ce7fa9
6 changed files with 93 additions and 12 deletions

View File

@@ -13,10 +13,21 @@ However, the individual parts are usually not called individually, but using the
\texttt{gcc} has (as of \texttt{GCC 15.2.1 20260103} on Arch Linux) about 1000 CLI arguments that can be passed. \texttt{gcc} has (as of \texttt{GCC 15.2.1 20260103} on Arch Linux) about 1000 CLI arguments that can be passed.
Below is a list of the most important flags that can be passed, as discussed in the lectures: Below is a list of the most important flags that can be passed, as discussed in the lectures:
\begin{tables}{ll}{Flag & Description} \begin{table}[h!]
\begin{tables}{ll}{Flag & Description}
\texttt{-E} & Stop after the preprocessor (output is a \texttt{.i} file) \\ \texttt{-E} & Stop after the preprocessor (output is a \texttt{.i} file) \\
\texttt{-S} & Stop after the compiler (output is assembly in \texttt{.s} file) \\ \texttt{-S} & Stop after the compiler (output is assembly in \texttt{.s} file) \\
\texttt{-c} & Stop after the assembler (output is \texttt{.o} file) \\ \texttt{-c} & Stop after the assembler (output is \texttt{.o} file) \\
\texttt{-o} & Specify the executable name \\ \texttt{-o} & Specify the executable name \\
\texttt{-DNDEBUG} & Removes all assert statements \\ \texttt{-DNDEBUG} & Removes all assert statements \\
\end{tables} \texttt{-OX} & Optimization level where \texttt{X} can be one of \texttt{0, 1, 2, 3} \\
\texttt{-g} & Compile with debugging information \\
\texttt{-Wall} & Enable common warnings \\
\texttt{-Wextra} & Enable more warnings \\
\texttt{-Werror} & Makes all warnings errors \\
\texttt{-march=XXX} & Optimize for the architecture (can be e.g. \texttt{native}, \texttt{x86-64-v4}, \dots) \\
\texttt{-fno-tree-vectorize} & Do not vectorize code (\texttt{-O3} commonly enables vectorization) \\
\end{tables}
\caption{Command line flags for GCC}
\label{tab:gcc-flags}
\end{table}

View File

@@ -0,0 +1,69 @@
\newpage
\subsection{Compiler optimizations}
While the compiler can do quite a bit to speed up code, it can't rework the core logic, as it has to guarantee that the executable does do what was specified in the code.
So, it is really important to not only consider asymptotic runtime (as \texttt{100n} and \texttt{5n} are both $\tco{n}$, but oviously the latter is 20 times faster).
We thus need to optimize the algorithms, data representations, loops, etc and for that, we need to properly understand how programs are compiled, executed and how the hadware works.
When using \texttt{gcc}, it is usually a good idea to compile a final build with the \texttt{-O2} or \texttt{-O3} flags.
The \texttt{-march} flag was already mentioned in table \ref{tab:gcc-flags} and can be used if you want to go above and beyond, as it will optimize for the specific hardware.
The values that can be passed to \texttt{-march} are listed \hlhref{https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html}{here} and even include a specific CPU microarchitecture.
For example, to compile for Intel Alderlake (12000 series), you can specify \texttt{-march=alderlake}
To understand what you need to optimize, you need to understand what the compiler is good at:
\begin{itemize}
\item Register allocation
\item Scheduling (i.e. code selection and ordering)
\item Dead code elimination
\item Eliminating minor (!) inefficiencies
\end{itemize}
and what it is not good at:
\begin{itemize}
\item Improving Asymptotic efficiency (compiler can't turn BubbleSort into e.g. QuickSort)
\item Improving the constant factor (if your implementation is slow, it likely won't magically become faster, though some bad practices can be eliminated)
\item Overcoming other optimization blockers such as memory aliasing and procedure side-effects
\end{itemize}
\content{Code motion} is a compiler technique, where it moves certain computations out of loops that always produce the same result.
However: Always remember that the compiler will be \bi{conservative}, i.e. it will always err on the side of caution.
\content{Strength reduction} is a compiler technique, where e.g. sequences of products are turned into cheaper additions in each iteration.
An example is that if you have an operation such as \texttt{n * i} in a loop,
the compiler might replace that with a variable \texttt{ni} that is incremented by \texttt{n} in each iteration.
Similarly, it might replace \texttt{16 * x}, or even worse still, \texttt{x / 16} with \texttt{x << 4} or \texttt{x >> 4}, respectively
\content{Common sub-expressions} can be extracted into pre-computations and then only use cheaper operations on the individual steps.
A good example is if you are using similar multiplications that then only require one addition or subtraction to get to a result close by.
\subsubsection{Optimization blockers}
A sure-fire way to make your code slow is by using a large number of procedure calls.
They are among the slowest operations in \lC.
And, the compiler cannot safely extract the function in a for loop like this:
\begin{code}{c}
int i;
for (i = 0; i < strlen(s); i++) {
if (s[i] >= 'A' && s[i] <= 'Z') {
s[i] -= ('A' - 'a');
}
}
\end{code}
The compiler can't safely remove \texttt{strlen(s)} from the loop, as it may have side-effects,
i.e. may modify other program content other than simply returning a value.
Thus, only ever call functions in the loop condition when you need the side-effects and otherwise, pre-compute it and simply use a variable to check against.
\begin{scriptsize}
You can declare a function \textit{side-effect free} using \verb|__attribute__((pure))| in the function declaration.
The compiler may then extract \texttt{strlen(s)} from the loop.
\end{scriptsize}
Another common blocker is memory aliasing. This happens when two pointers point to the same address and of course,
since we can do pointer arithmetic, it is very easy to do that in \lC.
The easiest way to prevent this from happening is to use local variables where possible,
such that they do not need to be passed in using a pointer.
Normally the compiler assumes there can be another pointer that accesses the memory pointed to by this pointer.
If you use the \texttt{restrict} keyword on the variable (i.e. in a function declaration, we have \texttt{void test(double restrict *a)}),
the compiler will assume that for the lifetime of this pointer, there are no other pointers that will be used to access the memory to which it points.
Another technique to improve throughput for something like matrix multiplications is to do it in blocks due to the way caching works.
Since the compiler doesn't \textit{understand} your code, it can't do this for you (as it assumes associativity of the operation)

View File

@@ -82,7 +82,7 @@ If there are changes and you'd like to update this summary, please open a pull r
\end{center} \end{center}
\newsection \newpage
\section{x86 Assembly} \section{x86 Assembly}
\input{parts/00_asm/00_intro.tex} \input{parts/00_asm/00_intro.tex}
\input{parts/00_asm/01_syntax/00_intro.tex} \input{parts/00_asm/01_syntax/00_intro.tex}
@@ -103,7 +103,7 @@ If there are changes and you'd like to update this summary, please open a pull r
% ── Intro to C ────────────────────────────────────────────────────── % ── Intro to C ──────────────────────────────────────────────────────
\newsection \newpage
\section{The C Programming Language} \section{The C Programming Language}
\input{parts/01_c/00_intro.tex} \input{parts/01_c/00_intro.tex}
\input{parts/01_c/01_basics/00_intro.tex} \input{parts/01_c/01_basics/00_intro.tex}
@@ -123,11 +123,12 @@ If there are changes and you'd like to update this summary, please open a pull r
\input{parts/01_c/05_vulnerabilities.tex} \input{parts/01_c/05_vulnerabilities.tex}
\newsection \newpage
\section{The gcc toolchain} \section{The gcc toolchain}
\input{parts/02_toolchain/00_intro.tex} \input{parts/02_toolchain/00_intro.tex}
\input{parts/02_toolchain/01_linking.tex} \input{parts/02_toolchain/01_compiler-optimizations.tex}
\input{parts/02_toolchain/02_file_types.tex} \input{parts/02_toolchain/02_linking.tex}
\input{parts/02_toolchain/03_file_types.tex}
% ── Hardware recap ────────────────────────────────────────────────── % ── Hardware recap ──────────────────────────────────────────────────