电子工程代写|并行计算代写Parallel Computing代考|Enhancing Uniprocessor Performance

In this chapter, we review techniques used to enhance the performance of a uniprocessor. A multiprocessor system or a parallel computer is composed of several uniprocessors and the performance of the entire system naturally depends, among other things, on the performance of the constituent uniprocessors. We also aim, in this chapter, to differentiate the techniques used to enhance uniprocessor performance from the techniques used to enhance multiprocessor performance, which are discussed in subsequent chapters.

Traditionally, building a computer was an expensive proposal. For almost 50 years, all effort went into designing faster single computer systems. It typically takes a microprocessor manufacturer 2 years to come up with the next central processing unit (CPU) version [1]. For the sake of the following discussion, we define a simple computer or processor as consisting of the following major components:

1. controller to coordinate the activities of the various processor components;
2. datapath or arithmetic and logic unit (ALU) that does all the required arithmetic and logic operations;
3. storage registers, on-chip cache, and memory; and
4. input/output (I/O) and networking to interface and communicate with the outside world.

The above components are sometimes referred to as the computer resources. Theses resources are shared between the different programs or processes running on the computer, and the job of the computer operating system (OS) is to organize the proper sharing and access to these resources. Making a processor run faster was accomplished through many techniques to enhance the datapath since it is the heart of any processor. We discuss datapath enhancements in the following subsections.

电子工程代写|并行计算代写Parallel Computing代考|INCREASING PROCESSOR CLOCK FREQUENCY

Increasing the system clock frequency allows the computer to execute more instructions per unit time. However, logic gates need time to switch states and system buses need time to be charged or discharged through bus drivers. These delays are closely tied to the underlying silicon technology such as NMOS, CMOS, and bipolar. The type of gate circuits also dictate the clock speed, such as using CMOS or domino logic or current-mode logic. There is also a fundamental limit on how fast a chip could run based on dynamic power dissipation. Dynamic power dissipation is given approximately by
$$p_{\mathrm{d}}=C f V^2,$$
where $C$ is the total parasitic capacitance, $f$ is the clock frequency, and $V$ is the power supply voltage. Engineers developed many techniques to reduce power consumption of the chip while raising the clock frequency. One obvious solution was to reduce the value of $C$ through finer lithographic process resolution. A bigger impact resulted when the chip power supply voltage was reduced from $5.0$ to $2.2 \mathrm{~V}$ and then $1.2 \mathrm{~V}$, and the question is how much the supply voltage can keep scaling down without affecting the gate switching noise margin.

电子工程代写|并行计算代写并行计算代考|增加处理器时钟频率

.

$$p_{\mathrm{d}}=C f V^2,$$
，其中$C$为总寄生电容，$f$为时钟频率，$V$为电源电压。工程师们开发了许多技术来降低芯片的功耗，同时提高时钟频率。一个明显的解决方案是通过更精细的光刻工艺分辨率来降低$C$的值。当芯片电源电压从$5.0$降低到$2.2 \mathrm{~V}$再到$1.2 \mathrm{~V}$时，产生了更大的影响，问题是在不影响门开关噪声裕度的情况下，电源电压可以持续降低多少

