Introduction to Parallel Processing
- In computers, parallel
processing is the processing of program instructions by dividing them among
multiple processors with the objective of running a program in less time.
- In
the earliest computers, only one program ran at a time. A computation-intensive
program that took one hour to run and a tape copying program that took one hour
to run would take a total of two hours to run.
- An early form of parallel
processing allowed the interleaved execution of both programs together. The
computer would start an I/O operation, and while it was waiting for the
operation to complete, it would execute the processor-intensive program. The
total execution time for the two jobs would be a little over one hour.
Parallel Processing Systems
are designed to speed up the execution of programs by dividing the program into
multiple fragments and processing these fragments simultaneously. Such systems
are multiprocessor systems also known as tightly coupled systems. Parallel
systems deal with the simultaneous use of multiple computer resources that can
include a single computer with multiple processors, a number of computers
connected by a network to form a parallel processing cluster or a combination
of both.
- Parallel computing is an
evolution of serial computing where the jobs are broken into discrete parts
that can be executed concurrently. Each part is further broken down to a series
of instructions. Instructions from each part execute simultaneously on
different CPUs.
- Parallel systems are more
difficult to program than computers with a single processor because the
architecture of parallel computers varies accordingly and the processes of
multiple CPUs must be coordinated and synchronized.
- The three models that are most commonly used in
building parallel computers include synchronous processors each with its own
memory, asynchronous processors each with its own memory and asynchronous
processors with a common, shared memory.
- Flynn has classified the computer
systems based on parallelism in the instructions and in the data streams. These
are:
1.
Single instruction stream, single data stream (SISD).
2. Single instruction stream, multiple data
stream (SIMD).
3. Multiple instruction streams, single
data stream (MISD).
4. Multiple instruction stream, multiple
data stream (MIMD).
Why Use Parallel Computing?
Main Reasons:
· - Save time and/or money;.
·
Solve larger problems: e.g.- Web
search engines/databases processing millions of transactions per second.
·
Provide concurrency.
Use of non-local resources:
Limits to serial computing:
o Transmission speeds - the speed of a serial computer is
directly dependent upon how fast data can move through hardware,
transmission limit of copper wire (9 cm/nanosecond).
o Limits to miniaturization.
o Economic limitations -
it is increasingly expensive to make a single processor faster.
Current computer architectures are increasingly relying upon
hardware level parallelism to improve performance:
· - Multiple execution units
· - Pipelined instructions
· - Multi-core
Multiple execution units
- In computer engineering,
an execution unit (also called a functional unit) is a part of the central
processing unit (CPU) that performs the operations and calculations as
instructed by the computer program.
- It may have its own
internal control sequence unit, which is not to be confused with the CPU's main
control unit, some registers, and other internal units such as an arithmetic
logic unit (ALU) or a floating-point unit (FPU), or some smaller and more
specific components.
Instruction pipelining
- Instruction pipelining
is a technique that implements a form of parallelism called instruction-level
parallelism within a single processor.
- It therefore allows
faster CPU throughput (the number of instructions that can be executed in a
unit of time) than would otherwise be possible at a given clock rate.
- The basic instruction
cycle is broken up into a series called a pipeline. Rather than processing each
instruction sequentially (finishing one instruction before starting the next),
each instruction is split up into a sequence of steps so different steps can be
executed in parallel and instructions can be processed concurrently (starting
one instruction before finishing the previous one).
- Pipelining increases
instruction throughput by performing multiple operations at the same time, but
does not reduce instruction latency, which is the time to complete a single
instruction from start to finish, as it still must go through all steps.
- Thus, pipelining
increases throughput at the cost of latency, and is frequently used in CPUs but
avoided in real-time systems, in which latency is a hard constraint.
- Each instruction is
split into a sequence of dependent steps. The first step is always to fetch the
instruction from memory; the final step is usually writing the results of the
instruction to processor registers or to memory. Pipelining seeks to let the
processor work on as many instructions as there are dependent steps, just as an
assembly line builds many vehicles at once, rather than waiting until one
vehicle has passed through the line before admitting the next one.
- The term pipeline is an analogy to the fact
that there is fluid in each link of a pipeline, as each part of the processor
is occupied with work.
Multi-core
- In consumer
technologies, multi-core is usually the term used to describe two or more CPUs
working together on the same chip.
- Also called multicore
technology, it is a type of architecture where a single physical processor
contains the core logic of two or more
processors. These processors are packaged into a single integrated circuit
(IC). These single integrated circuits are called a die.
- Multi-core can also
refer to multiple dies packaged together. Multi-core enables the system to
perform more tasks with a greater overall system performance.
- Multi-core technology can be used in desktops,
mobile PCs, servers and workstations. Contrast with dual-core, a single chip
containing two separate processors (execution cores) in the same IC.