mirror of
https://github.com/janishutz/eth-summaries.git
synced 2026-03-14 10:50:05 +01:00
[SPCA] CPP toolchain, prepare unorthodox control flow slides summary
This commit is contained in:
@@ -13,10 +13,21 @@ However, the individual parts are usually not called individually, but using the
|
|||||||
|
|
||||||
\texttt{gcc} has (as of \texttt{GCC 15.2.1 20260103} on Arch Linux) about 1000 CLI arguments that can be passed.
|
\texttt{gcc} has (as of \texttt{GCC 15.2.1 20260103} on Arch Linux) about 1000 CLI arguments that can be passed.
|
||||||
Below is a list of the most important flags that can be passed, as discussed in the lectures:
|
Below is a list of the most important flags that can be passed, as discussed in the lectures:
|
||||||
|
\begin{table}[h!]
|
||||||
\begin{tables}{ll}{Flag & Description}
|
\begin{tables}{ll}{Flag & Description}
|
||||||
\texttt{-E} & Stop after the preprocessor (output is a \texttt{.i} file) \\
|
\texttt{-E} & Stop after the preprocessor (output is a \texttt{.i} file) \\
|
||||||
\texttt{-S} & Stop after the compiler (output is assembly in \texttt{.s} file) \\
|
\texttt{-S} & Stop after the compiler (output is assembly in \texttt{.s} file) \\
|
||||||
\texttt{-c} & Stop after the assembler (output is \texttt{.o} file) \\
|
\texttt{-c} & Stop after the assembler (output is \texttt{.o} file) \\
|
||||||
\texttt{-o} & Specify the executable name \\
|
\texttt{-o} & Specify the executable name \\
|
||||||
\texttt{-DNDEBUG} & Removes all assert statements \\
|
\texttt{-DNDEBUG} & Removes all assert statements \\
|
||||||
|
\texttt{-OX} & Optimization level where \texttt{X} can be one of \texttt{0, 1, 2, 3} \\
|
||||||
|
\texttt{-g} & Compile with debugging information \\
|
||||||
|
\texttt{-Wall} & Enable common warnings \\
|
||||||
|
\texttt{-Wextra} & Enable more warnings \\
|
||||||
|
\texttt{-Werror} & Makes all warnings errors \\
|
||||||
|
\texttt{-march=XXX} & Optimize for the architecture (can be e.g. \texttt{native}, \texttt{x86-64-v4}, \dots) \\
|
||||||
|
\texttt{-fno-tree-vectorize} & Do not vectorize code (\texttt{-O3} commonly enables vectorization) \\
|
||||||
\end{tables}
|
\end{tables}
|
||||||
|
\caption{Command line flags for GCC}
|
||||||
|
\label{tab:gcc-flags}
|
||||||
|
\end{table}
|
||||||
|
|||||||
@@ -0,0 +1,69 @@
|
|||||||
|
\newpage
|
||||||
|
\subsection{Compiler optimizations}
|
||||||
|
While the compiler can do quite a bit to speed up code, it can't rework the core logic, as it has to guarantee that the executable does do what was specified in the code.
|
||||||
|
|
||||||
|
So, it is really important to not only consider asymptotic runtime (as \texttt{100n} and \texttt{5n} are both $\tco{n}$, but oviously the latter is 20 times faster).
|
||||||
|
We thus need to optimize the algorithms, data representations, loops, etc and for that, we need to properly understand how programs are compiled, executed and how the hadware works.
|
||||||
|
|
||||||
|
When using \texttt{gcc}, it is usually a good idea to compile a final build with the \texttt{-O2} or \texttt{-O3} flags.
|
||||||
|
|
||||||
|
The \texttt{-march} flag was already mentioned in table \ref{tab:gcc-flags} and can be used if you want to go above and beyond, as it will optimize for the specific hardware.
|
||||||
|
The values that can be passed to \texttt{-march} are listed \hlhref{https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html}{here} and even include a specific CPU microarchitecture.
|
||||||
|
For example, to compile for Intel Alderlake (12000 series), you can specify \texttt{-march=alderlake}
|
||||||
|
|
||||||
|
To understand what you need to optimize, you need to understand what the compiler is good at:
|
||||||
|
\begin{itemize}
|
||||||
|
\item Register allocation
|
||||||
|
\item Scheduling (i.e. code selection and ordering)
|
||||||
|
\item Dead code elimination
|
||||||
|
\item Eliminating minor (!) inefficiencies
|
||||||
|
\end{itemize}
|
||||||
|
and what it is not good at:
|
||||||
|
\begin{itemize}
|
||||||
|
\item Improving Asymptotic efficiency (compiler can't turn BubbleSort into e.g. QuickSort)
|
||||||
|
\item Improving the constant factor (if your implementation is slow, it likely won't magically become faster, though some bad practices can be eliminated)
|
||||||
|
\item Overcoming other optimization blockers such as memory aliasing and procedure side-effects
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
\content{Code motion} is a compiler technique, where it moves certain computations out of loops that always produce the same result.
|
||||||
|
However: Always remember that the compiler will be \bi{conservative}, i.e. it will always err on the side of caution.
|
||||||
|
|
||||||
|
\content{Strength reduction} is a compiler technique, where e.g. sequences of products are turned into cheaper additions in each iteration.
|
||||||
|
An example is that if you have an operation such as \texttt{n * i} in a loop,
|
||||||
|
the compiler might replace that with a variable \texttt{ni} that is incremented by \texttt{n} in each iteration.
|
||||||
|
Similarly, it might replace \texttt{16 * x}, or even worse still, \texttt{x / 16} with \texttt{x << 4} or \texttt{x >> 4}, respectively
|
||||||
|
|
||||||
|
\content{Common sub-expressions} can be extracted into pre-computations and then only use cheaper operations on the individual steps.
|
||||||
|
A good example is if you are using similar multiplications that then only require one addition or subtraction to get to a result close by.
|
||||||
|
|
||||||
|
\subsubsection{Optimization blockers}
|
||||||
|
A sure-fire way to make your code slow is by using a large number of procedure calls.
|
||||||
|
They are among the slowest operations in \lC.
|
||||||
|
And, the compiler cannot safely extract the function in a for loop like this:
|
||||||
|
\begin{code}{c}
|
||||||
|
int i;
|
||||||
|
for (i = 0; i < strlen(s); i++) {
|
||||||
|
if (s[i] >= 'A' && s[i] <= 'Z') {
|
||||||
|
s[i] -= ('A' - 'a');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
\end{code}
|
||||||
|
The compiler can't safely remove \texttt{strlen(s)} from the loop, as it may have side-effects,
|
||||||
|
i.e. may modify other program content other than simply returning a value.
|
||||||
|
Thus, only ever call functions in the loop condition when you need the side-effects and otherwise, pre-compute it and simply use a variable to check against.
|
||||||
|
\begin{scriptsize}
|
||||||
|
You can declare a function \textit{side-effect free} using \verb|__attribute__((pure))| in the function declaration.
|
||||||
|
The compiler may then extract \texttt{strlen(s)} from the loop.
|
||||||
|
\end{scriptsize}
|
||||||
|
|
||||||
|
Another common blocker is memory aliasing. This happens when two pointers point to the same address and of course,
|
||||||
|
since we can do pointer arithmetic, it is very easy to do that in \lC.
|
||||||
|
The easiest way to prevent this from happening is to use local variables where possible,
|
||||||
|
such that they do not need to be passed in using a pointer.
|
||||||
|
|
||||||
|
Normally the compiler assumes there can be another pointer that accesses the memory pointed to by this pointer.
|
||||||
|
If you use the \texttt{restrict} keyword on the variable (i.e. in a function declaration, we have \texttt{void test(double restrict *a)}),
|
||||||
|
the compiler will assume that for the lifetime of this pointer, there are no other pointers that will be used to access the memory to which it points.
|
||||||
|
|
||||||
|
Another technique to improve throughput for something like matrix multiplications is to do it in blocks due to the way caching works.
|
||||||
|
Since the compiler doesn't \textit{understand} your code, it can't do this for you (as it assumes associativity of the operation)
|
||||||
@@ -82,7 +82,7 @@ If there are changes and you'd like to update this summary, please open a pull r
|
|||||||
\end{center}
|
\end{center}
|
||||||
|
|
||||||
|
|
||||||
\newsection
|
\newpage
|
||||||
\section{x86 Assembly}
|
\section{x86 Assembly}
|
||||||
\input{parts/00_asm/00_intro.tex}
|
\input{parts/00_asm/00_intro.tex}
|
||||||
\input{parts/00_asm/01_syntax/00_intro.tex}
|
\input{parts/00_asm/01_syntax/00_intro.tex}
|
||||||
@@ -103,7 +103,7 @@ If there are changes and you'd like to update this summary, please open a pull r
|
|||||||
|
|
||||||
|
|
||||||
% ── Intro to C ──────────────────────────────────────────────────────
|
% ── Intro to C ──────────────────────────────────────────────────────
|
||||||
\newsection
|
\newpage
|
||||||
\section{The C Programming Language}
|
\section{The C Programming Language}
|
||||||
\input{parts/01_c/00_intro.tex}
|
\input{parts/01_c/00_intro.tex}
|
||||||
\input{parts/01_c/01_basics/00_intro.tex}
|
\input{parts/01_c/01_basics/00_intro.tex}
|
||||||
@@ -123,11 +123,12 @@ If there are changes and you'd like to update this summary, please open a pull r
|
|||||||
\input{parts/01_c/05_vulnerabilities.tex}
|
\input{parts/01_c/05_vulnerabilities.tex}
|
||||||
|
|
||||||
|
|
||||||
\newsection
|
\newpage
|
||||||
\section{The gcc toolchain}
|
\section{The gcc toolchain}
|
||||||
\input{parts/02_toolchain/00_intro.tex}
|
\input{parts/02_toolchain/00_intro.tex}
|
||||||
\input{parts/02_toolchain/01_linking.tex}
|
\input{parts/02_toolchain/01_compiler-optimizations.tex}
|
||||||
\input{parts/02_toolchain/02_file_types.tex}
|
\input{parts/02_toolchain/02_linking.tex}
|
||||||
|
\input{parts/02_toolchain/03_file_types.tex}
|
||||||
|
|
||||||
|
|
||||||
% ── Hardware recap ──────────────────────────────────────────────────
|
% ── Hardware recap ──────────────────────────────────────────────────
|
||||||
|
|||||||
Reference in New Issue
Block a user