Files
eth-summaries/semester3/spca/parts/02_toolchain/01_linking.tex
2026-01-13 16:53:01 +01:00

83 lines
3.6 KiB
TeX

\subsection{Linking}
Linking is the final step in the compilation pipeline: separately compiled object files are combined into an executable.
The advantages of using Linkers are clear:
\begin{enumerate}
\item \textbf{Separate Compilation}: Changing one source file requires only recompiling that file.
\item \textbf{Space Optimization}: Executable code only contains functions (e.g. from libraries) that are actually used.
\end{enumerate}
\subsubsection{Symbol Resolution}
The first step during Linking is Symbol Resolution.
In the context of Linking, all variables and functions are considered \textit{Symbols}. Compilers store all symbol definitions in a \textit{Symbol Table}.
The linker associates symbol references with \textit{exactly one} definition.
\inlinedef \textbf{Symbol types}
\begin{itemize}
\item \textbf{Global Symbols} can be referenced by other modules (e.g. \texttt{non-static} in \texttt{C})
\item \textbf{External Symbols} are referenced globals defined elsewhere
\item \textbf{Local Symbols} are defined and referenced exclusively in one module (e.g. \texttt{static} in \texttt{C})
\end{itemize}
Note: Local linker symbols and local program variables are \textit{not} the same.
\inlinedef \textbf{Symbol strength}
Duplicate symbols either lead to linking errors (\texttt{-fno-common}, the default) or compile (\texttt{-fcommon})
\begin{itemize}
\item \textbf{Strong Symbols} are procedure names and initialized globals
\item \textbf{Weak Symbols} are uninitialized globals (on \texttt{-fcommon})
\end{itemize}
in \texttt{C}, function symbols can explicitly be declared weak using:
\begin{minted}{C}
#pragma weak func
__attribute__((weak))__ func()
\end{minted}
\content{Duplicate Handling} The linker uses these definitions to handle duplicates:
\begin{enumerate}
\item Given multiple strong symbols are illegal
\item Given a strong symbol and multiple weak symbols, pick the strong symbol
\item Given multiple weak symbols, choose an \textit{arbitrary} one
\end{enumerate}
\subsubsection{Relocation}
The second step during Linking is Relocation.
Code and data sections of separate sources are combined, and symbols are relocated from relative locations (in \texttt{.o} files) to absolute locations (in \texttt{.exe} files)
\textbf{Command line order matters} for this, since the Linker will scan \texttt{.o} and \texttt{.a} files in this order. In general, libraries should therefore be linked \textit{last}.
\newpage
\subsubsection{Packaging Libraries}
Using just the Linker, there are only 2 inconvenient ways to package libraries:
\begin{enumerate}
\item All functions into 1 file $\mapsto$ linking unnecessarily big objects.
\item One function per file $\mapsto$ Requires linking a lot of files, annoying for programmer.
\end{enumerate}
\textbf{Static Libraries} solve this: The linker looks for functions inside the static library, and only links matching archive \textit{members} into the executable. However, these come with issues too:
\begin{enumerate}
\item Duplication in stored executables (e.g. \texttt{libc.a} functions)
\item Duplication in running executables
\item Any fix in a library requires importing applications to explicitly relink
\end{enumerate}
\textbf{Shared Libraries} solve this: These are linked at load-time or during run-time. Another advantage is that \textit{multiple} processes can use the same shared library simultaneously. This is how, for example, \texttt{libc} is packaged.
During runtime, shared libraries can be loaded using \texttt{dlopen}:
\inputcodewithfilename{gas}{code-examples/00_c/04_toolchain/}{01_dynamic_linking.c}
\newpage