## 计算机代写|并行计算代考Parallel Computing代写|INTRODUCTION

The idea of a single-processor computer is fast becoming archaic and quaint. We now have to adjust our strategies when it comes to computing:

• It is impossible to improve computer performance using a single processor. Such processor would consume unacceptable power. It is more practical to use many simple processors to attain the desired performance using perhaps thousands of such simple computers [1].
• As a result of the above observation, if an application is not running fast on a single-processor machine, it will run even slower on new machines unless it takes advantage of parallel processing.
• Programming tools that can detect parallelism in a given algorithm have to be developed. An algorithm can show regular dependence among its variables or that dependence could be irregular. In either case, there is room for speeding up the algorithm execution provided that some subtasks can run concurrently while maintaining the correctness of execution can be assured.
• Optimizing future computer performance will hinge on good parallel programming at all levels: algorithms, program development, operating system, compiler, and hardware.
• The benefits of parallel computing need to take into consideration the number of processors being deployed as well as the communication overhead of processor-to-processor and processor-to-memory. Compute-bound problems are ones wherein potential speedup depends on the speed of execution of the algorithm by the processors. Communication-bound problems are ones wherein potential speedup depends on the speed of supplying the data to and extracting the data from the processors.
• Memory systems are still much slower than processors and their bandwidth is limited also to one word per read/write cycle.
• Scientists and engineers will no longer adapt their computing requirements to the available machines. Instead, there will be the practical possibility that they will adapt the computing hardware to solve their computing requirements.

This book is concerned with algorithms and the special-purpose hardware structures that execute them since software and hardware issues impact each other. Any software program ultimately runs and relies upon the underlying hardware support provided by the processor and the operating system. Therefore, we start this chapter with some definitions then move on to discuss some relevant design approaches and design constraints associated with this topic.

## 计算机代写|并行计算代考Parallel Computing代写|TOWARD AUTOMATING PARALLEL PROGRAMMING

We are all familiar with the process of algorithm implementation in software. When we write a code, we do not need to know the details of the target computer system since the compiler will take care of the details. However, we are steeped in thinking in terms of a single central processing unit (CPU) and sequential processing when we start writing the code or debugging the output. On the other hand, the processes of implementing algorithms in hardware or in software for parallel machines are more related than we might think. Figure $1.1$ shows the main phases or layers of implementing an application in software or hardware using parallel computers. Starting at the top, layer 5 is the application layer where the application or problem to be implemented on a parallel computing platform is defined. The specifications of inputs and outputs of the application being studied are also defined. Some input/output (I/O) specifications might be concerned with where data is stored and the desired timing relations of data. The results of this layer are fed to the lower layer to guide the algorithm development.

Layer 4 is algorithm development to implement the application in question. The computations required to implement the application define the tasks of the algorithm and their interdependences. The algorithm we develop for the application might or might not display parallelism at this state since we are traditionally used to linear execution of tasks. At this stage, we should not be concerned with task timing or task allocation to processors. It might be tempting to decide these issues, but this is counterproductive since it might preclude some potential parallelism. The result of this layer is a dependence graph, a directed graph (DG), or an adjacency matrix that summarize the task dependences.

Layer 3 is the parallelization layer where we attempt to extract latent parallelism in the algorithm. This layer accepts the algorithm description from layer 4 and produces thread timing and assignment to processors for software implementation. Alternatively, this layer produces task scheduling and assignment to processors for custom hardware very large-scale integration (VLSI) implementation. The book concentrates on this layer, which is shown within the gray rounded rectangle in the figure.

## 计算机代写|并行计算代考Parallel Computing代写|INTRODUCTION

• 使用单个处理器不可能提高计算机性能。这样的处理器将消耗不可接受的功率。使用许多简单的处理器来获得所需的性能更实用，可能使用数千台这样的简单计算机 [1]。
• 作为上述观察的结果，如果应用程序在单处理器机器上运行速度不快，那么它在新机器上的运行速度会更慢，除非它利用并行处理的优势。
• 必须开发能够检测给定算法中的并行性的编程工具。算法可以显示其变量之间的规则依赖关系，或者这种依赖关系可能是不规则的。无论哪种情况，只要一些子任务可以并发运行，同时保证执行的正确性，算法执行都有加速的空间。
• 优化未来的计算机性能将取决于所有级别的良好并行编程：算法、程序开发、操作系统、编译器和硬件。
• 并行计算的好处需要考虑部署的处理器数量以及处理器到处理器和处理器到内存的通信开销。计算受限问题是潜在加速取决于处理器执行算法的速度的问题。通信绑定问题是潜在的加速取决于向处理器提供数据和从处理器提取数据的速度。
• 内存系统仍然比处理器慢得多，并且它们的带宽也被限制为每个读/写周期一个字。
• 科学家和工程师将不再使他们的计算要求适应可用的机器。相反，他们将有可能调整计算硬件来解决他们的计算需求。

