DSP.NOV–Spectrum Signal Processing–pm
DSP chips accelerate image and graphics processing
The compute-intensive nature of these applications beg for the use of
such mathematical powerhouses
BY ROSS MITCHELL Spectrum Signal Processing Systems Burnaby, B.C.,
Canada
Until recently, the limited throughput and memory access capabilities of
the typical digital signal processor (DSP) chips precluded its use in
image-processing and graphics systems. However, advances in DSP
performance, coupled with architectural enhancements that permit use of
DSP chips alongside other devices, have changed all this. Now DSP chips
are considered a cost-effective programmable alternative to traditional
dedicated hardware or expensive array-processing solutions. With its
new-found high-speed and parallel-processing capabilities, the DSP chip
has found a natural home in the compute-intensive, memory devouring
environment of graphics and image processing. A typical digital
image-processing system comprises an acquisition device, a manipulation
and/or data fusion computer, and an output and/or storage device. Image
data may come from a multitude of input sources, such as video cameras,
charge-coupled devices, ultrasound images, and computerized tomography, or
from radiometric backscatter from a radar sensor. The image is then
digitized for computer processing by the DSP chip, or for digital storage.
The most common output device for a digital image-processing system is a
CRT display, either in gray level or color. The output image is constructed
and delivered to the display for visual presentation and operation
interaction. Other output media include thermal or laser printers and film
recorders. the DSP chip performs application-specific algorithms to
transform the input image for display or automatic control. The processing
of the data varies dramatically between different input sensors and
applications. The processing algorithms require an understanding of the
physical properties of the input data source, mathematical
signal-processing theory, and proven heuristics. Examples of typical
imaging applications and the DSP functions performed are shown in Table 1.
Graphics applications Graphics display systems are architecturally
similar to image-processing systems. A graphics-processing computer, with
a video control computer (display), accesses a large multiport video RAM
buffer for image manipulation and presentation, respectively. Using a list
memory (typically), an on-board DSP performs specific functions on the
data. These functions include line and polygon drawing, oval drawing,
rotation and scaling, area filling, object projection, 3-D rendering, and
ray tracing and illumination. Like image processing, graphics-processing
algorithms are computationally intensive and require processing in two or
three dimensions. This quadruples computer resource requirements. the DSP
chip must possess a combination of fast memory access (to the bit-map
pixel memory) and processing horsepower to handle the volume of matrix
arithmetic.
DSP implementation DSP chips that are designed to handle this sort of
processing include the TMS320C30 and the recently introduced TMS320C40
(which comprises the C30 and has six ports for communicating with other
C40s) from Texas Instruments; the DSP 96002 from Motorola; the 21020 from
Analog Devices; the 32C from AT&T; and the i860XP from Intel Corp. (for
more detailed information on these processors, see Table 2). These devices
satisfy many, or all, of the needs of advanced graphics and imaging
applications–including a data throughput rate of more than 50 MFLOPS.
Also, each is capable of accessing large banks of inexpensive memory and
of applying multiprocessor resources. Several features of the DSP96002,
the TMS320C30, and the TMS320C40 make them ideally suited to imaging and
graphics applications:
* Single-instruction parallel floating-point multiplier/accumulator for
matrix and vector arithmetic. * Addressing modes for retrieving and
storing vectors and matrices. * Internal memories for efficient workspace
operations. * DMA controllers and dual bus structures for high throughput
of coordinate data.
Typically, the two buses are used to partition either the input data
and/or the output data so the processor can simultaneously retrieve
multiple input data streams while delivering output data. Previously,
these hardware features were implemented in microcode, or inside CISC or
RISC devices. Sometimes they were handled by software, which resulted in
slow and inefficient program execution. The processing power that DSP chips
represent is essential to handle the large amounts of matrix
arithmetic–such as dot products, matrix inversions, and convolution–used
in graphics and image processing. A typical dot product of a 4 x 4 matrix
and a 4 x 1 vector requires a total of 16 multiplications and 12
additions. The hardware multiplier and accumulator in a DSP reduces the
instructions needed by up to 40% or more. In a TMS320C30/40, this
translates into 20 instruction cycles, yielding a kernel throughput of 1
million matrix-vector multiplies/s (assuming a clock period of 20 ns). The
instruction count can be even further reduced using straight line code
instead of loops.
Multiprocessing Board-level products incorporating multiple
floating-point devices are available from several vendors for the PC/AT
and the VMEbus. Multiprocessor architectures use on-chip communications
ports (and an internal DMA controller), dual memories, mezzanine buses, or
cache coherency logic to provide efficient processor-to-processor
communication. Image-processing algorithms, in general, lend themselves
very well to partitioning the processing among parallel-processing
elements. Typical partitioning includes decomposition of the image into
subimages and allocating the processing of the subimages to individual
processing elements, as well as partitioning the data on a frame-by-frame
basis. An example of a DSP-based computer architecture for image
processing is shown in the figure. Video information is received and
captured by a video interface board comprising A/D converters for either
color or composite input and video memory for image buffering. The image
is distributed over a very-high-speed video bus to multiple DSP chips
working together to process the data. Video bus implementations include
backplane buses, fiber-optic networks, and mezzanine buses. Typical
data-transfer rates are 80 Mbytes/s or greater. The video bus must support
low-overhead burst-mode transfer of data from processor to memory and
processor to processor. The scalability of processing resources provides
the appropriate match of resources to requirements. The processing
elements deliver the output data to an output image display buffer for
user presentation and interaction.
CAPTION:
Typical image partitioning includes decomposition of the image into
subimages and allocating the processing of the subimages to individual
processing elements, as well as partitioning the data on a frame-by-frame
basis.
Advertisement