eth-summaries/semester3/spca/parts/00_asm/01_syntax.tex

\newpage
\subsection{The syntax}
There are two common styles: AT\&T syntax (common on UNIX) and Intel syntax (common on Windows)

The state that is visible to us is:
\begin{itemize}
    \item PC (Program Counter) that contains the address of the next instruction
    \item Register file that contains the most used program data
    \item Condition codes that store status information about most recent arithmetic operation and are used for conditional branching
\end{itemize}

To view what \lC\ code looks like in assembly, we can use \texttt{gcc -O0 -S code.c}, which produces \texttt{code.s} which contains assembly code.

\subsubsection{Registers}
\texttt{x86} assembly is a bit particular with register naming (register names all start in \%).
The initial 16-bit version of \texttt{x86} had the following registers (sub registers are registers that can be used to access the high
(\texttt{h} suffix) or low (\texttt{l} suffix) half of the register. Only registers ending in \texttt{x} feature these sub registers.
They, as well as \texttt{\%si} and \texttt{\%di} are general purpose):
\begin{tables}{lll}{Name    & Sub-registers                & Description}
              \texttt{\%ax} & \texttt{\%ah}, \texttt{\%al} & accumulate          \\
              \texttt{\%cx} & \texttt{\%ch}, \texttt{\%cl} & counter             \\
              \texttt{\%dx} & \texttt{\%dh}, \texttt{\%dl} & data                \\
              \texttt{\%bx} & \texttt{\%bh}, \texttt{\%bl} & base                \\
              \texttt{\%si} & -                            & Source index        \\
              \texttt{\%di} & -                            & Destination index   \\
              \hline
              \texttt{\%sp} & -                            & Stack pointer       \\
              \texttt{\%bp} & -                            & Base pointer        \\
              \texttt{\%ip} & -                            & Instruction pointer \\
              \texttt{\%sr} & -                            & Status (flags)      \\
\end{tables}
When the architecture was extended to 32-bit, all registers previously available were retained and a 32 bit version of each was introduced with the prefix \texttt{e}.
In other words, any 16 bit code would still work as previously, as e.g. the \texttt{\%ax} register was simply now the lower 16 bits of the \texttt{\%eax} register.

The same happened again when extending to 64-bit, only this time the \texttt{r} prefix was used.
So, the register \texttt{\$eax} was now the lower 32 bits of \texttt{\%rax}.

Additionally, the following registers are also available, with \texttt{X} to be substituted with 8 through 15: \texttt{\%rX} and the lower 32 bits \texttt{\%rXd}

\subsubsection{Instructions}
Instructions usually have a 3 letter \texttt{mnemonic} with a one letter postfix that indicates the number of bytes.
The following postfixes are available: \texttt{b} (byte, 1 byte), \texttt{w} (word, 2 bytes), \texttt{l} (long word, 4 bytes) and \texttt{q} (quad, 8 bytes).

The following options can be passed for source and destination: Registers,

\content{Immediates} To use a constant value (aka Immediate) in an instruction, we prefix the number with \texttt{\$} (following number is decimal).
To use hex, we can use \texttt{\$0x}, etc.

\content{Memory addresses} To treat a register as a memory address, use parenthesis, e.g. \texttt{(\%rax)} interprets the value of \texttt{\%rax} as a memory address.
The instruction will then read the number of bytes, as specified by the postfix of the instruction.

The full syntax for memory address modes is \texttt{D(Rb, Ri, S)}, where
\begin{itemize}[noitemsep]
    \item \texttt{D}: Displacement (constant offset), can be 0, 1, 2 or 4 bytes (not bits, if you are confused as I was)
    \item \texttt{Rb}: Base register (to which offsets, etc are added). Can be any of the 16 integer registers
    \item \texttt{Ri}: Index register: Any, except for \texttt{\%rsp} (and \texttt{\%rbp} is also rarely used)
    \item \texttt{S}: Scale factor (1, 2, 4 or 8, to correct offsets)
\end{itemize}
The computation that happens is the following: \texttt{Mem[ Reg[Rb] + S * Reg[Ri] + D ]}.
Using the \texttt{lea src, dest} instruction, we can get the address computed into the dest register.
Can be abused for similar arithmetic expressions.