diff --git a/semester3/spca/parts/03_hw/05_exceptions.tex b/semester3/spca/parts/03_hw/05_exceptions.tex index 091c671..8cf952d 100644 --- a/semester3/spca/parts/03_hw/05_exceptions.tex +++ b/semester3/spca/parts/03_hw/05_exceptions.tex @@ -60,5 +60,5 @@ For example, \begin{itemize} \item \textbf{Interrupts} are actions like network data arrival or hitting a key on the keyboard \item \textbf{Hard Reset Interrupts} are executed by hitting the system reset button - \item \textbf{Soft Reset Interrupts} ate caused by, for example, hitting \verb|CTRL|+\verb|ALR|+\verb|DEL| + \item \textbf{Soft Reset Interrupts} are caused by, for example, hitting \verb|CTRL|+\verb|ALR|+\verb|DEL| (on Windows) \end{itemize} diff --git a/semester3/spca/parts/03_hw/06_multicore/00_background.tex b/semester3/spca/parts/03_hw/06_multicore/00_background.tex new file mode 100644 index 0000000..9e1a99e --- /dev/null +++ b/semester3/spca/parts/03_hw/06_multicore/00_background.tex @@ -0,0 +1,25 @@ +\newpage +\subsection{Multi-Core} +\subsubsection{Background} +In the early days of computer hardware it was fairly easy to get higher performance due to the rapid advances in transistor technology. +However, today, what is known as Moore's Law (i.e. that the transistor count of integrated circuits doubles every two years). +However, due to power constraints and the slowing down of advances in transistor technology, +the transistor count growth has slowed down quite a bit since the beginning of the century and is predicted to further stagnate as time goes on. + +This leads to various issues, among others, the performance of CPUs isn't going up as quickly anymore as it used to. +Additionally, due to power constraints, building faster and faster single-core CPUs is not possible and the advances in that field have slowed to a crawl. + +To mitigate and offset these issues, manufacturers started to add multiple cores to parallelize operations. +This however brings a whole host of new issues with it, for example, how do you make sure that no data races occur, +how do you schedule, etc? These questions have mostly been answered in the course Parallel Programming, so we will not cover that here. + +The only reason transistor count is still growing at a seemingly constant rate today is that manufacturers manage to cram more and more cores into a CPU. +But even that has slowed down in recent years. + +While in 2019 a highest core count AMD EPYC CPU (i.e. the EPYC 7742 from the ROME family) had 64 Zen 2 cores, +in 2025 the highest core count EPYC CPU (i.e. the EPYC 9965 from the EPYC Turin Dense Family) had 192 Zen 5c cores, +where the highest full core CPU was the EPYC 9755 (from the EPYC Turin family), which had 128 Zen 5 cores. + +The way they manage this while not hitting the power wall is by making the CPUs physically larger. +While a consumer Ryzen 9 9950X3D (the fastest consumer CPU at the time of writing) easily fits into the palm of even a small hand, +an EPYC Turin CPU is so large that it covers most of even a big hand. diff --git a/semester3/spca/parts/03_hw/06_multicore/01_limitations.tex b/semester3/spca/parts/03_hw/06_multicore/01_limitations.tex new file mode 100644 index 0000000..7affc0d --- /dev/null +++ b/semester3/spca/parts/03_hw/06_multicore/01_limitations.tex @@ -0,0 +1,21 @@ +\subsubsection{Limitations} +\content{The Power Wall} More and more transistors need more and more power, thus leading to power delivery and dissipation becoming and issue. +To compute the power dissipation, use the formula $P_{diss} = P_{dyn} + P_{leak} + P_{short}$, +where $P_{dyn} = C V^2 f$ (with $C$ the capacitance, $V$ the supply voltage and $f$ the processor frequency) is the dynamic power, +$P_{leak}$ the leakage power (see DDCA) and $P_{short}$ the short circuit power while switching. + +At some point the chip becomes almost impossible to cool. A great example of a CPU series that suffers from this is the Intel Rocket Lake CPUs. +The Intel Core i9-14900K is notoriously hot-running, using almost 300 watts for a very small chip and thus runs very hot. + +Thus, to further increase performance, chip designers are trying to make the hardware more efficient, which allows them to further boost performance with extra power headroom. + +\content{The Memory Wall} Between 1985 and 2005, CPU performance has increased on average by 55\% a year, whereas memory throughput has only increased by roughly 10\% a year. +Thus, performance has more and more become limited by memory performance rather than pure CPU performance and to this day is the largest overhead in most applications. + +\content{The ILP Wall} While it is possible to improve single core performance using instruction-level parallelism, +this has been thoroughly exhausted and is not a feasible way to significantly improve CPU performance. + +Around 2003, all of these walls were hit simultaneously, as they hit a power wall and thus could not clock the processors any higher, +the memory access times were the limiting factors and ILP was almost completely exhausted, as not enough parallel instructions existed in code. + +Current trends are a reduction in clock frequency in favour of more parallelism in the hardware, e.g. by providing more cores, or better caching, branch prediction, etc. diff --git a/semester3/spca/parts/03_hw/06_multicore/02_consistency-coherencey.tex b/semester3/spca/parts/03_hw/06_multicore/02_consistency-coherencey.tex new file mode 100644 index 0000000..3b1de2a --- /dev/null +++ b/semester3/spca/parts/03_hw/06_multicore/02_consistency-coherencey.tex @@ -0,0 +1,9 @@ +\subsubsection{Coherency and Consistency} +\fancydef{Coherency} The values in cache all match each other and the processors all see a coherent view of the memory + +\fancydef{Consistency} The order in which changes are seen by different processors is consistent + +Most modern system's CPU cores are caches coherent, i.e. it behaves as if all cores access a single memory array. +This leads to one big advantage: It is easy to program, however is hard to implement in hardware and memory is also slower as a result. + +Memory consistency on the other hand is not standardized across companies diff --git a/semester3/spca/parts/03_hw/06_multicore.tex b/semester3/spca/parts/03_hw/06_multicore/03_sync.tex similarity index 100% rename from semester3/spca/parts/03_hw/06_multicore.tex rename to semester3/spca/parts/03_hw/06_multicore/03_sync.tex diff --git a/semester3/spca/parts/03_hw/06_multicore/04_smp.tex b/semester3/spca/parts/03_hw/06_multicore/04_smp.tex new file mode 100644 index 0000000..e69de29 diff --git a/semester3/spca/parts/03_hw/06_multicore/05_numa.tex b/semester3/spca/parts/03_hw/06_multicore/05_numa.tex new file mode 100644 index 0000000..e69de29 diff --git a/semester3/spca/parts/03_hw/06_multicore/06_optim.tex b/semester3/spca/parts/03_hw/06_multicore/06_optim.tex new file mode 100644 index 0000000..e69de29 diff --git a/semester3/spca/spca-summary.pdf b/semester3/spca/spca-summary.pdf index 44be5e2..baee382 100644 Binary files a/semester3/spca/spca-summary.pdf and b/semester3/spca/spca-summary.pdf differ diff --git a/semester3/spca/spca-summary.tex b/semester3/spca/spca-summary.tex index 8c9a40b..91052d2 100644 --- a/semester3/spca/spca-summary.tex +++ b/semester3/spca/spca-summary.tex @@ -152,7 +152,13 @@ If there are changes and you'd like to update this summary, please open a pull r \input{parts/03_hw/03_caches.tex} \input{parts/03_hw/04_virtual-memory.tex} \input{parts/03_hw/05_exceptions.tex} -\input{parts/03_hw/06_multicore.tex} +\input{parts/03_hw/06_multicore/00_background.tex} +\input{parts/03_hw/06_multicore/01_limitations.tex} +\input{parts/03_hw/06_multicore/02_consistency-coherencey.tex} +\input{parts/03_hw/06_multicore/03_sync.tex} +\input{parts/03_hw/06_multicore/04_smp.tex} +\input{parts/03_hw/06_multicore/05_numa.tex} +\input{parts/03_hw/06_multicore/06_optim.tex} \input{parts/03_hw/07_dev.tex}