mirror of
https://github.com/janishutz/eth-summaries.git
synced 2026-03-14 17:00:05 +01:00
62 lines
3.0 KiB
TeX
62 lines
3.0 KiB
TeX
\newpage
|
|
\subsubsection{Rounding}
|
|
|
|
The basic idea of Floating Point operations is:
|
|
\begin{enumerate}
|
|
\item Compute exact result
|
|
\item Round, so it fits the desired precision
|
|
\end{enumerate}
|
|
|
|
\textit{IEEE Standard 754} specifies $4$ rounding modes: \textit{Towards Zero, Round Down, Round Up, Nearest Even}.
|
|
|
|
The default used is \textit{Nearest Even}\footnote{Changing the rounding mode is usually hard to do without using Assembly.}, which rounds up/down depending on which number is closer, like regular rounding, but picks the nearest even number if it's exactly in the middle.
|
|
|
|
Rounding can be defined using 3 different bits from the \textit{exact} number: $G, R, S$
|
|
$$
|
|
a = 1.B_1B_2\ldots B_{n - 2}B_{n - 1}\underbrace{G}_\text{Guard}\underbrace{R}_\text{Round}
|
|
\underbrace{X_1X_2\ldots X_{k - 1}X_k}_\text{Sticky}
|
|
$$
|
|
where $n$ is the number of bits in the mantissa of the format (e.g. $3$ as in the above example of an $8$bit floating point number).
|
|
|
|
\begin{enumerate}
|
|
\item \textbf{Guard Bit} $G$ is the least significant bit of the (rounded) result (i.e. it is $B_n$)
|
|
\item \textbf{Round Bit} $R$ is the $1$st bit cut off after rounding
|
|
\item \textbf{Sticky Bit} $S$ is the logical OR of all remaining cut off bits $X_i$.
|
|
\end{enumerate}
|
|
|
|
Based on these bits the rounding can be decided (we increment the rounded part if the expression evaluates to true):
|
|
\hrmvspace
|
|
\begin{align*}
|
|
\text{Round up: } R \land S
|
|
& &
|
|
\text{Round to even: } G \land R \land \lnot S
|
|
\end{align*}
|
|
|
|
\drmvspace
|
|
It is notable that for round to even, the special condition only applies if the sticky bit is not set. If it is set, the round up condition is to be used.
|
|
An easy way to implement the condition is as follows
|
|
\mint{c}+(sticky && round) || (!sticky && round && guard)+
|
|
This will be ever so slightly more efficient than a different order, as the computation will be stopped shorter if a condition is not fulfilled
|
|
|
|
\content{Example} Rounding $8$b precise results to $8$b precision floating point ($4$b mantissa):
|
|
|
|
\renewcommand{\arraystretch}{1.2}
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c|c|c|}
|
|
\hline
|
|
\textbf{Value} & \textbf{Fraction} & \textbf{GRS} & \textbf{Incr?} & \textbf{Rounded} \\
|
|
\hline
|
|
$128$ & $1.000|0000$ & $000$ & N & $1.000$ \\
|
|
$13$ & $1.101|0000$ & $100$ & N & $1.101$ \\
|
|
$17$ & $1.000|1000$ & $010$ & N & $1.000$ \\
|
|
$19$ & $1.001|1000$ & $110$ & Y & $1.010$ \\
|
|
$138$ & $1.000|1010$ & $011$ & Y & $1.001$ \\
|
|
$63$ & $1.111|1100$ & $111$ & Y & $10.000$ \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{center}
|
|
\renewcommand{\arraystretch}{1.0}
|
|
|
|
|
|
\textbf{Post-Normalization}: Rounding may cause overflow. In this case: Shift right once and increment exponent.
|