mirror of
https://github.com/janishutz/eth-summaries.git
synced 2026-03-14 17:00:05 +01:00
71 lines
4.8 KiB
TeX
71 lines
4.8 KiB
TeX
\newpage
|
|
\subsection{Compiler optimizations}
|
|
While the compiler can do quite a bit to speed up code, it can't rework the core logic, as it has to guarantee that the executable does do what was specified in the code.
|
|
|
|
So, it is really important to not only consider asymptotic runtime (as \texttt{100n} and \texttt{5n} are both $\tco{n}$, but oviously the latter is 20 times faster).
|
|
We thus need to optimize the algorithms, data representations, loops, etc and for that, we need to properly understand how programs are compiled, executed and how the hadware works.
|
|
|
|
When using \texttt{gcc}, it is usually a good idea to compile a final build with the \texttt{-O2} or \texttt{-O3} flags.
|
|
|
|
The \texttt{-march} flag was already mentioned in table \ref{tab:gcc-flags} and can be used if you want to go above and beyond, as it will optimize for the specific hardware.
|
|
The values that can be passed to \texttt{-march} are listed \hlhref{https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html}{here} and even include a specific CPU microarchitecture.
|
|
For example, to compile for Intel Alderlake (12000 series), you can specify \texttt{-march=alderlake}
|
|
|
|
To understand what you need to optimize, you need to understand what the compiler is good at:
|
|
\begin{itemize}
|
|
\item Register allocation
|
|
\item Scheduling (i.e. code selection and ordering)
|
|
\item Dead code elimination
|
|
\item Eliminating minor (!) inefficiencies
|
|
\end{itemize}
|
|
and what it is not good at:
|
|
\begin{itemize}
|
|
\item Improving Asymptotic efficiency (compiler can't turn BubbleSort into e.g. QuickSort)
|
|
\item Improving the constant factor (if your implementation is slow, it likely won't magically become faster, though some bad practices can be eliminated)
|
|
\item Overcoming other optimization blockers such as memory aliasing and procedure side-effects
|
|
\end{itemize}
|
|
|
|
\content{Code motion} is a compiler technique, where it moves certain computations out of loops that always produce the same result.
|
|
However: Always remember that the compiler will be \bi{conservative}, i.e. it will always err on the side of caution.
|
|
|
|
\content{Strength reduction} is a compiler technique, where e.g. sequences of products are turned into cheaper additions in each iteration.
|
|
An example is that if you have an operation such as \texttt{n * i} in a loop,
|
|
the compiler might replace that with a variable \texttt{ni} that is incremented by \texttt{n} in each iteration.
|
|
Similarly, it might replace \texttt{16 * x}, or even worse still, \texttt{x / 16} with \texttt{x << 4} or \texttt{x >> 4}, respectively
|
|
|
|
\content{Common sub-expressions} can be extracted into pre-computations and then only use cheaper operations on the individual steps.
|
|
A good example is if you are using similar multiplications that then only require one addition or subtraction to get to a result close by.
|
|
|
|
\subsubsection{Optimization blockers}
|
|
A sure-fire way to make your code slow is by using a large number of procedure calls.
|
|
They are among the slowest operations in \lC.
|
|
And, the compiler cannot safely extract the function in a for loop like this:
|
|
\begin{code}{c}
|
|
int i;
|
|
for (i = 0; i < strlen(s); i++) {
|
|
if (s[i] >= 'A' && s[i] <= 'Z') {
|
|
s[i] -= ('A' - 'a');
|
|
}
|
|
}
|
|
\end{code}
|
|
The compiler can't safely remove \texttt{strlen(s)} from the loop, as it may have side-effects,
|
|
i.e. may modify other program content other than simply returning a value.
|
|
Thus, only ever call functions in the loop condition when you need the side-effects and otherwise, pre-compute it and simply use a variable to check against.
|
|
\begin{scriptsize}
|
|
You can declare a function \textit{side-effect free} using \verb|__attribute__((pure))| or \verb|__attribute__((const))|
|
|
(this is more strict, as the function is also not allowed to read global memory) in the function declaration.
|
|
The compiler may then extract \texttt{strlen(s)} from the loop.
|
|
\end{scriptsize}
|
|
|
|
Another common blocker is memory aliasing. This happens when two pointers point to the same address and of course,
|
|
since we can do pointer arithmetic, it is very easy to do that in \lC.
|
|
The easiest way to prevent this from happening is to use local variables where possible,
|
|
such that they do not need to be passed in using a pointer.
|
|
|
|
Normally the compiler assumes there can be another pointer that accesses the memory pointed to by this pointer.
|
|
If you use the \texttt{restrict} keyword on the variable (i.e. in a function declaration, we have \texttt{void test(double restrict *a)}),
|
|
the compiler will assume that for the lifetime of this pointer, there are no other pointers that will be used to access the memory to which it points.
|
|
|
|
Another technique to improve throughput for something like matrix multiplications is to do it in blocks due to the way caching works.
|
|
Since the compiler doesn't \textit{understand} your code, it can't do this for you (as it assumes associativity of the operation)
|