Multimedia ICs advance in
both high and consumer ends
A new media processor takes the field,
while entry-level systems now do 3-D acceleration
BY RODNEY MYRVAAGNES
Associate Editor
Multimedia ICs have continued to attract considerable engineering investment
over the past year, but no one architecture has become dominant. Multimedia
extensions have appeared in more microprocessor instruction sets, including
non-Intel x86 chips.
The specialized architectures for high-end acceleration are all still
in the running, with a new one coming. And single-chip accelerators for
consumer PCs now do 3-D acceleration, rather than the 2-D their predecessors
handled a year ago.
New high-end architecture
A new architecture has appeared to compete for the high-end media-processor
market. Bops (Palo Alto, CA)–a licensing firm formed to continue the M-fast
project discontinued by IBM–has launched the Manifold Array (ManArray),
an array-processor architecture that is extensible in both torus and hypercube
topologies.
By licensing its design to semiconductor makers in the same way that
Advanced RISC Machines (Los Gatos, CA) does, for example, the company hopes
to enter the same market as the Chromatic Research (Sunnyvale, CA) Mpact
and the Philips Semiconductors (Sunnyvale, CA) Trimedia programmable media
processors. Bops will retain control of the instruction set, which all
licensees will be contracted to follow. Unlike Chromatic, Bops will publish
the instruction set so that anyone can write applications or develop tools.
The ManArray structure is derived from a fully connected 4 x 4 torus
of processing elements(PEs). A series of row and column transpositions
ends with each row of four cells containing two cells with their respective
transposes. For example, cells (2,0) (1,1) (0,2), and (3,3) come together.
Furthermore, all communications between rows are now either north and
west or south and east. The row of four is physically put into a 2 x 2
block to make the basic die design, with the external connections combined
in a single bus from both the north-and-west and the south-and-east wires.
Bringing all cells adjacent to their transposes reduces the data delay
between them to one cycle. Many common DSP processes–such as FFTs, matrix
transpositions and multiplications, and discrete cosine transforms (DCTs)–transpose
data elements and thus benefit from the ManArray topology. When 2 x 2 clusters
are joined to make larger clusters, PE-to-PE communication between adjacent
clusters remains one cycle.
The PEs are joined and controlled by a sequence processor (SP). The
SP, a superset of the PE, includes a fully connected crossbar (cluster
switch) joining the four PEs on the chip (see Fig. 1 ). When multiple
chips are used in SIMD mode, the SP on one of the chips can control PEs
on other chips that are executing the same instruction stream.
Fig. 1. The basic ManArray building block is a 2 x 2 cluster of
processing elements, controlled by a sequence processor and a cluster switch.
Alternatively, when a series of operations must be done on each member
of a large data array, PEs or groups of PEs can be connected sequentially
as an ad-hoc pipeline. This reconnection is entirely software controlled,
using the SPs, and can be done in the same physical machine as the torus
or hypercube SIMD.
The internal logic of the PE has load-store capability like a normal
microprocessor, but it also has the capability of storing reusable lines
of instructions, grouped into encapsulated very-long-instruction words
(eVLIW) (see Fig. 2 ). When an algorithm has been debugged and its
operation is well understood, up to five instructions that can execute
simultaneously may be usefully grouped into each VLIW and stored in the
VLIW-instruction memory (VIM). Then, each time the VLIW is needed, it is
invoked by a single 32-bit instruction in the input stream.
Fig. 2. Instructions for the ManArray's processing elements come
in 32-bit words, but groups of operations that can execute simultaneously
may be stored in the VLIW memory and rerun by a single instruction.
Programming the ManArray can start with sequential code like any DSP
or microprocessor. When that is debugged, packed data types may be introduced
where appropriate.
Examination of the code will identify combinations of execution units
used inside loops. These can then be grouped into eVLIWs that initialize,
run, and empty a pipeline.
The initial ManArray chip, called Kittyhawk, will be a 2 x 2 cluster.
It is expected to be introduced in the first half of 1998.
Second-generation media processor
Meanwhile, the Chromatic Mpact family has continued to advance. Toshiba
America (Irvine, CA) is in production with the second-generation Mpact2/6000
media processor. It is twice as fast as its predecessor, and is said to
perform 6,000 MOPS.
Like earlier Mpact processors, the new chip uses Mpact mediaware software
modules. They work in conjunction with a host x86 processor, with or without
MMX extensions. Functions for which software is currently available include
MPEG-1 and MPEG-2 video, wave-table audio, and 2-D and 3-D graphics acceleration,
as well as full DVD playback.
The chip includes six ALU groups, two Rambus channels, a 4-Kbyte data
cache, a 2-Kbyte instruction cache, and a 2-Kbyte texture cache, as well
as a 230-MHz RAMDAC.
Mpact demonstration centers are set up in Toshiba sales offices in San
Jose, CA; Irvine, CA; and Wakefield, MA. The chip is packaged in a 352-pin
BGA. According to the maker, the complete bill of materials for an add-on
board with full graphics and DVD playback is well under $100.
Digital camera image processor
The Raptor chipset from Sierra Imaging (Scotts Valley, CA) incorporates
the logic of a digital camera based on either a CMOS or CCD sensor. Besides
the Raptor chip itself, the set includes a Sparclite microprocessor and
a small, serial-interface microcontroller, both from Fujitsu, that handles
power management and user I/O operations.
Targeted at midrange digital photography, the Raptor chipset has a DSP
facility that can do 240 million multiply-accumulate operations/s, allowing
it to process up to 58 Mpixels/s. The processing power is used to improve
image quality and remove artifacts found in low-end digital images. Running
company-supplied software, the chip handles any DCT-based compression/decompression
algorithms, including JPEG 2000 when that becomes finalized.
In addition to its processing capacity, the Raptor chip has control
generators and interfaces for the sensing system and an in-camera LCD.
It also features an NTSC/ PAL output, a serial port to a PC host, control
signals for the Sparclite and memory bus, and an interface to the small
microcontroller (see Fig. 3 ).
Fig. 3. The Raptor chip has interfaces to all outside functions,
including a power management system.
Software controls the interface behavior, as well as the image processing
itself. Thus, the sequencer for the sensor can handle different resolutions,
either by stepping the A/D converter used with a CCD or by stepping a CMOS
sensor directly.
Memory-bus and arbitration-control sections interface to 8-, 16-, or
32-bit DRAMs as well as to flash cards used for both image storage and
boot ROMs. Using 32-bit EDO DRAM, the memory interface has a 120-Mbyte/s
bandwidth. Samples are available now. The chipset, including the two Fujitsu
parts, is $30 each in lots of 50,000.
Virtual surround sound
The MED25008 TruSurround from Medianix Semiconductor (Mountain View,
CA) is an all-digital device whose input is a Dolby Pro Logic decoder.
The chip decodes the surround-sound signals in the same way a normal surround-sound
processor would, and it generates signals for rear speakers.
But it also applies the Tru-Surround algorithms licensed from SRS Labs
(Santa Ana, CA) to the surround-sound signals to generate signals for a
pair of stereo speakers. Because the entire surround-sound field is reduced
to two channels at the listener's ears, it should be possible to generate
any sound a listener could perceive with only a pair of stereo speakers,
as long as the listener's head doesn't move. The chip applies the TruSurround
algorithms licensed from SRS Labs (Santa Ana, CA) to the surround-sound
signals to construct signals for a pair of stereo speakers. The resulting
signals, given proper speaker and central listener location, effectively
duplicate the surround-sound effect.
The ability to localize virtual sound sources in 3-D space depends on
being in a central area in front of and between the stereo speakers. The
SRS method is said to allow some head movement without losing the illusion.
Consumer products using the chip can be made upgradable from two speakers
to the six required for normal surround sound.
Two designer's kits are available for evaluation and prototyping: the
EVB25008-3 3-D demonstration board, for $195, and the EVB25008-5 evaluation
board, for $295. The evaluation board operates in both 3-D and six-channel
Dolby Pro configurations. The MED25008 comes in an 80-pin PQFP, costs $15.95
each in lots of 1,000, and is available now.
The company also offers the MED25018, which includes the '25008 functionality
and adds BBE processing, licensed from BBE Sound (Huntington Beach, CA).
The chip costs $16.95 each in lots of 1,000.
Another chip, the MED25009 is functionally similar to the '25008, but
uses Spatializer N-2-2 processing software from Spatializer Audio Laboratories
(Woodland Hills, CA) instead of TruSurround. The '25009 costs $14.95 each
in lots of 1,000. Two designer's kits are available for evaluation and
prototyping either chip: the EVB25008/9-3 3-D demonstration board, for
$195, and the EVB25008/9-5 evaluation board, for $295. The evaluation board
operates in both 3-D and six-channel Dolby Pro configurations.
DVD processing
The TroikaCSS from Oak Technology (Sunnyvale, CA) is a DVD decoder and
presentation engine that supports the content-scrambling technology mandated
by the DVD council. It integrates MPEG-2 and MPEG-1 video decompression,
Dolby Digital, MPEG-1, and Linear PCM audio decompression, subpicture decoding,
audio/video synchronization, on-screen display, and demultiplexing along
with copy protection.
The TroikaCSS is the fourth member of the company's DVD family. System-level
reference designs are available. The chip comes in a 160-pin QFP for $30
each in lots of 10,000, and samples are available now.
Portable AGP graphics
Trident Microsystems (Mountain View, CA) and Samsung Semiconductor (San
Jose, CA) have jointly developed the Cyber9520, a 3-D flat-panel graphics
accelerator implementing the accelerated graphics port (AGP) specified
by Intel (Santa Clara, CA). The chip integrates a single-cycle 3-D graphics
engine with 2 Mbytes of 100-MHz synchronous DRAM. Three-dimensional features
include Z-buffering, Gouraud shading, alpha-blending, specular lighting,
fog, and dithering.
The Cyber9520 provides dual/simultaneous display, with multiple views
on a flat panel or separately on a flat panel, CRT, or TV set. It also
includes eight power management functions, and is capable of shutting down
the graphics processors dynamically without software control, and shutting
down blocks in response to pin signals.
The chip supports 66- and 133-MHz AGPs with sideband support for fetching
texture maps from main memory. The chip costs $42 each in lots of 10,000,
and samples will be available this quarter.
Aiming squarely at the same market, the LynxE (SM810) multimedia accelerator
from Silicon Motion (San Jose, CA) (see Fig. 4 ) offers features
similar to the Trident part with a 192-bit acceleration engine, a 135-MHz
RAMDAC, and power management. It does not have the embedded RAM of the
Trident chip. The LynxE is packaged in a 256-pin BGA and costs $42 each
in lots of 1,000. It is available now.
Fig. 4. The LynxE 192-bit graphics accelerator brings 3-D graphics
and power management to portable displays.
In the desktop area, the Trio3D from S3 (Santa Clara, CA) aims at business
PC graphics, with emphasis on 2-D performance. It features a 128-bit pipelined
architecture, support for a 125-MHz SGRAM frame buffer, and the company's
burst command interface, a protocol that works with either PCI or AGP.
The chip implements the new standard Video Interface Port providing
a dedicated connection to digital video devices. It also provides an easy
interface to peripherals such as video cameras, TV tuners, and DVD/MPEG-2
decoders. The part is packaged in either a 208-pin PQFP or a 336-pin BGA.
Samples are available now, and the part will cost $22 each in lots of 10,000,
with production in the second quarter.
The following manufacturers supplied information for this article:
Bops | |||
Palo Alto, CA | |||
Daniel Eakins 650-324-2440 | |||
http://www.bops.com | |||
Medianix Semiconductor | |||
Mountain View, CA | |||
Don Philips 650-960-7081, ext. 225 | |||
Fax 650-960-0478 | |||
http://www.medianix.com | |||
Oak Technology | |||
Sunnyvale, CA | |||
Vince Guaglianone 408-737-0888 | |||
Fax 408-737-3838 | |||
http://www.oaktech.com | |||
S3 Inc. | |||
Santa Clara, CA | |||
Paul Crossley 408-588-8664 | |||
Fax 408-980-5444 | |||
http://www.S3.com | |||
Sierra Imaging | |||
Scotts Valley, CA | |||
Barbara Matthews 408-461-2070, | |||
ext. 220 | |||
http://www.sierraimaging.com | |||
Silicon Motion | |||
San Jose, CA | |||
Tom Kao 408-467-9388, ext. 396 | |||
Fax 408-467-9390 | |||
http://www.siliconmotion.com | |||
Toshiba America | |||
San Jose, CA | |||
Hotline 800-879-4963 | |||
http://www.toshiba.com/taec | |||
http://www.mpact.com/taec | |||
Trident Microsystems | |||
Mountain View, CA | |||
Peter Brown 415-943-3761 | |||
http://www.tridentmicro.com |
Advertisement