diff --git a/semester3/spca/parts/02_toolchain/01_linking.tex b/semester3/spca/parts/02_toolchain/01_linking.tex new file mode 100644 index 0000000..cccc7e8 --- /dev/null +++ b/semester3/spca/parts/02_toolchain/01_linking.tex @@ -0,0 +1,55 @@ +\subsection{Linking} + +Linking is the final step in the compilation pipeline: separately compiled object files are combined into an executable. + +The advantages of using Linkers are clear: +\begin{enumerate} + \item \textbf{Separate Compilation}: Changing one source file requires only recompiling that file. + \item \textbf{Space Optimization}: Executable code only contains functions (e.g. from libraries) that are actually used. +\end{enumerate} + +\subsubsection{Symbol Resolution} + +The first step during Linking is Symbol Resolution. + +In the context of Linking, all variables and functions are considered \textit{Symbols}. Compilers store all symbol definitions in a \textit{Symbol Table}. +The linker associates symbol references with \textit{exactly one} definition. + +\inlinedef \textbf{Symbol types} + +\begin{itemize} + \item \textbf{Global Symbols} can be referenced by other modules (e.g. \texttt{non-static} in \texttt{C}) + \item \textbf{External Symbols} are referenced globals defined elsewhere + \item \textbf{Local Symbols} are defined and referenced exclusively in one module (e.g. \texttt{static} in \texttt{C}) +\end{itemize} + +Note: Local linker symbols and local program variables are \textit{not} the same. + +\inlinedef \textbf{Symbol strength} + +Duplicate symbols either lead to linking errors (\texttt{-fno-common}, the default) or compile (\texttt{-fcommon}) + +\begin{itemize} + \item \textbf{Strong Symbols} are procedure names and initialized globals + \item \textbf{Weak Symbols} are uninitialized globals (on \texttt{-fcommon}) +\end{itemize} + +in \texttt{C}, function symbols can explicitly be declared weak using: +\begin{minted}{C} + #pragma weak func + __attribute__((weak))__ func() +\end{minted} + +\content{Duplicate Handling} The linker uses these definitions to handle duplicates: + +\begin{enumerate} + \item Given multiple strong symbols are illegal + \item Given a strong symbol and multiple weak symbols, pick the strong symbol + \item Given multiple weak symbols, choose an \textit{arbitrary} one +\end{enumerate} + +\subsubsection{Relocation} + +The second step during Linking is Relocation. + +Code and data sections of separate sources are combined, and symbols are relocated from relative locations (in \texttt{.o} files) to absolute locations (in \texttt{.exe} files) diff --git a/semester3/spca/parts/02_toolchain/02_file_types.tex b/semester3/spca/parts/02_toolchain/02_file_types.tex new file mode 100644 index 0000000..53556c7 --- /dev/null +++ b/semester3/spca/parts/02_toolchain/02_file_types.tex @@ -0,0 +1,64 @@ +\subsection{File types} + +The most common file types used during compilation: + +\begin{itemize} + \item \textbf{Source Code File} (\texttt{.c}) Uncompiled source code in \texttt{C}. + \item \textbf{Relocatable Object File} (\texttt{.o}) Code \& Data in a format ready for linking. + \item \textbf{Shared Object File} (\texttt{.so}) Special object file, can be loaded \& linked dynamically: at load or run time. + \item \textbf{Exectuable File} Code \& Data in a format that can be directly copied into memory \& run. + \item \textbf{Archive files} (\texttt{.a}) concatenate related \texttt{.o} files into one \texttt{.a}, with an index. +\end{itemize} + +\content{Alternate names} On Windows, \texttt{.dll} files are used instead of \texttt{.so} and are called \textit{Dynamic Link Libraries}. + +\subsubsection{Executable and Linkable Format (ELF)} + +The standard unified format for all object files (\texttt{.exe}, \texttt{.o}, \texttt{.so}) in use since UNIX. + +\begin{center} + \begin{tabular}{l|l} + \textbf{Section} & \textbf{Content} \\ + \hline + ELF header & contains basic information: Word size, byte ordering, file type, machine type \\ + Segment header table & page size, virtual address memory segments, segment sizes \\ + \texttt{.text} & actual code \\ + \texttt{.rodata} & (Read-only data) for example, jump tables \\ + \texttt{.data} & initialized global variables \\ + \texttt{.bss} & uninitialized global variables \\ + \texttt{.symtab} & symbol table, procedure and static variable names, section names \& locations. \\ + \texttt{.rel.text} & relocation info for \texttt{.text}, e.g. addresses of instructions that need modifying \\ + \texttt{.rel.data} & relocation info for \texttt{.data}, e.g. addresses of pointers that need modifying \\ + \texttt{.debug} & info for symbolic debugging (\texttt{gcc -g}) + \end{tabular} +\end{center} + +% Compact table with only the segment names + +% \begin{center} +% \begin{tabular}{|c|} +% \hline +% ELF header \\ +% \hline +% Segment header table \\ +% \hline +% \texttt{.text} \\ +% \hline +% \texttt{.rodata} \\ +% \hline +% \texttt{.data} \\ +% \hline +% \texttt{.bss} \\ +% \hline +% \texttt{.symtab} \\ +% \hline +% \texttt{.rel.txt} \\ +% \hline +% \texttt{.rel.data} \\ +% \hline +% \texttt{.debug} \\ +% \hline +% Section header table \\ +% \hline +% \end{tabular} +% \end{center} \ No newline at end of file diff --git a/semester3/spca/spca-summary.pdf b/semester3/spca/spca-summary.pdf index 42617b4..180e044 100644 Binary files a/semester3/spca/spca-summary.pdf and b/semester3/spca/spca-summary.pdf differ diff --git a/semester3/spca/spca-summary.tex b/semester3/spca/spca-summary.tex index 9b0ed76..a5e1ed2 100644 --- a/semester3/spca/spca-summary.tex +++ b/semester3/spca/spca-summary.tex @@ -125,6 +125,8 @@ If there are changes and you'd like to update this summary, please open a pull r \newsection \section{The gcc toolchain} \input{parts/02_toolchain/00_intro.tex} +\input{parts/02_toolchain/01_linking.tex} +\input{parts/02_toolchain/02_file_types.tex} % ── Hardware recap ──────────────────────────────────────────────────