New DRAMs aim to ease
main-memory bottlenecks
New interface standards, on-chip caches, and banked architectures
all compete for places in future computation
BY DAVID BONDURANT
Enhanced Memory Systems
Colorado Springs, CO
Over the past three years, DRAM manufacturers have launched several
novel memory architectures aimed at overcoming the growing performance
drag imposed by conventional multiplexed memories. This drag affects both
main-memory and frame-buffer subsystems. The effectiveness of the various
approaches depends on both latency and bandwidth.
A small, simple SRAM cache memory was once sufficient to reduce DRAM
memory latency and bandwidth requirements, making main-memory speed insignificant
to system performance. New operating systems, databases, and graphic images
are now much larger than caches. Processors operate at clock rates much
faster than the memory bus. Multitasking, multithreading operating systems
force frequent context switches, losing the remaining cache effectiveness.
Beyond fast-page mode
In 1992, DRAM companies began introducing specialized DRAM products
to solve the growing performance gap. Today, numerous DRAM alternatives
have the architectural complexity of complete mainframe and supercomputer
memory subsystems on a single chip (seebox , “Unraveling DRAM
performance claims”).
Companies such as Mitsubishi (Sunnyvale, CA) and Ramtron International's
subsidiary, Enhanced Memory Systems (Colorado Springs, CO), are among the
first to introduce specialized memory subsystem chips. The Mitsubishi Cached
DRAM (CDRAM) combines a traditional segmented 10-ns SRAM cache, a 60-ns
DRAM, and a synchronous interface on a single chip. This integration improves
latency on cache hits and increases the width of the DRAM to cache memory
bus for fast cache-fill time.
Enhanced Memory Systems' Enhanced DRAM (EDRAM) is the first single-chip
specialty memory to combine a 10-ns SRAM page-wide cache with a fast 25-ns
DRAM array and an even wider DRAM-to-cache bus. The EDRAM also includes
new control features to eliminate traditional DRAM precharge and refresh
penalties.
The JEDEC DRAM committee has defined a set of standard DRAMs targeted
at improving performance. The extended-data-output (EDO) DRAM is a simple
modification to the fast-page-mode DRAM.
EDO DRAM allows output data to be held while the next-page access is
started. This pipelining of data improves the burst speed of an otherwise-unchanged
fast-page-mode DRAM from 25 MHz to as high as 50 MHz.
Companies such as Mosel-Vitelic (San Jose, CA) and Silicon Magic (Cupertino,
CA) are now pushing EDO speed to 66 MHz and increasing random-access speed
to 35 ns. In a little over one year, EDO DRAM has become the most common
DRAM main memory in PC systems.
Another JEDEC EDO DRAM, the burst EDO (BEDO) DRAM, is defined by Micron
Technology (Boise, ID). This DRAM increases burst speed to 66 MHz by putting
a burst-address counter on chip. The BEDO DRAM has not been successful
in PC systems because it did not get Intel's chipset support.
A third JEDEC memory is intended for speeds exceeding 100 MHz. The synchronous
DRAM (SDRAM) combines a higher level of pipelining, a fully synchronous
interface, multiple DRAM banks, and an optional low-voltage series-stub-terminated
logic (SSTL) interface to provide high burst speed, but no significant
improvement in basic random-access speed. The SDRAM's multiple DRAM banks
allow the slow initial latency of the DRAM to be hidden when random accesses
to alternate DRAM banks overlap current data bursts.
The 16-Mbit SDRAM has two banks. The 64-Mbit SDRAM increases the number
of banks to four. After a slow start, the SDRAM appears poised to become
the next standard DRAM for PC main memory during 1997. At the same time,
a variation of the SDRAM called the synchronous graphics RAM (SGRAM) provides
the high-burst-speed advantages of the SDRAM in a 256-K x 32-bit configuration
tailored for the commodity graphics market.
Rambus DRAM (RDRAM), a revolutionary DRAM developed by Rambus (Mountain
View, CA), extracts over 500 Mbytes/s of peak bandwidth from a single chip.
This high bandwidth per chip is especially suited to small memory systems
containing only a few memory chips, such as systems for graphics boards
and future low-end PC applications.
The RDRAM operates its low-voltage I/O interface at a 250-MHz synchronous
clock rate, clocking data on both edges of the clock by multiplexing control,
address, and data over the same 8-bit-wide data bus, and providing two
separate DRAM banks on-chip. The RDRAM interface is optimized for short
bus length and limited memory size.
An unfortunate side effect of the highly multiplexed RDRAM architecture
is slower initial latency, which reduces the sustained bandwidth under
random-access or short-data-burst conditions. To improve this, Rambus has
defined a second-generation Concurrent Rambus DRAM that increases the number
of DRAM banks to four. It also changes the control interface to hide random-access
latency when accessing alternate DRAM banks, similar to SDRAM.
The new RDRAM products increase clock rate to 300 MHz and peak data
rate to 600 Mbytes/s. Rambus has succeeded in licensing the RDRAM to a
large number of manufacturing partners. RDRAM is expected to be a major
player in graphics applications and a player in PC main memory if latency
issues can be resolved. Intel apparently believes these issues will be
resolved because it has licensed the Rambus interface for its future motherboard
chipsets.
Multibank DRAM (MDRAM), developed by MoSys (Santa Clara, CA), combines
features from the SGRAM and the RDRAM to provide another primarily graphics-oriented
DRAM. The MDRAM uses a synchronous interface with standard LVTTL and SSTL
interfaces, like SDRAM.
The MDRAM clocks data on both clock edges, like RDRAM, and provides
more internal DRAM banks (up to 32). MDRAM clocks at up to 166 MHz and
provides a wider 32-bit data bus to provide similar peak data rates to
RDRAM. MDRAM is expected to be a competitor in graphics applications.
In 1996, Enhanced Memory Systems announced a family of Multibank EDO
EDRAM and Multibank Burst EDO EDRAM products. These products combine the
fast latency of the original EDRAM with four DRAM and cache banks, along
with EDO and burst EDO pipelining.
These feature multiple 10-ns SRAM caches and 25-ns DRAM banks to extend
EDO DRAM burst rates to 100 MHz while providing fast initial random-access
latency. The combination of latency and burst rate optimize bandwidth under
short-burst conditions such as in PC main memory or I/O buffer memories
for bus rates of 100 MHz or below.
At the end of 1996, Enhanced Memory Systems announced the development
of the Enhanced Synchronous DRAM (ESDRAM). This memory combines the features
of the low-latency multibank EDRAM with the synchronous interface and low-voltage
I/O of the SDRAM. This pin-, function-, and timing-compatible family of
Enhanced SDRAMs has 4-M x 4-bit, 2-M x 8-bit, and 1-M x 16-bit configurations.
ESDRAM will drop into existing SDRAM component or DIMM module footprints
while providing faster initial latency and sustained bandwidth at clock
rates of 83, 100, and 133 MHz–for example, page access is two cycles at
133 MHz, rather than four cycles. Row-access latency is four cycles at
133 MHz rather than eight cycles.
More important, ESDRAM uses its dual-SRAM page caches to eliminate the
latency for pipelined random accesses to a single bank or between banks.
This feature increases sustained bandwidth by two times compared to SDRAM.
The ESDRAM has the potential to provide the next level of performance beyond
the standard SDRAM for main-memory applications.
Future challenges
The challenge for DRAM architects will continue to increase. By the
year 2000, memory bus rates will likely increase from 100 MHz to 1 GHz.
Program size and complexity will also increase, driving the demand for
faster random-access speed and larger memory sizes. The next DRAM should
combine multiple fast-SRAM caches, fast-DRAM banks, synchronous-control
interfaces operating at low-voltage levels, and advanced programmable delay
lines to de-skew the system buses.
Will the next-generation DRAM look like the double data rate (DDR) SDRAM
or SCI-based Synclink DRAM (SLDRAM) now emerging from the JEDEC committee,
or will Intel define the next-generation DRAM with Rambus? DRAM architects
will need every trick in the book to keep pace with the microprocessor
of 2001.
Unraveling DRAM performance claims
The growing complexity of specialty DRAM memory alternatives makes it
increasingly difficult for system designers to compare memory performance
and its impact on final system performance. Yesterday's assumption about
the effectiveness of SRAM caches–that because caches have high hit rates,
DRAM does not need to have high performance–is no longer true.
Much DRAM advertising and press coverage simplifies DRAM performance
by comparing only peak bandwidth or I/O clock rate. These simplifications
are valid only for systems with long burst lengths. A majority of today's
real-world systems require totally random data access or a random access
followed by a 4-word data burst.
A thorough comparison of performance should also look at initial random-access
latency and sustained bandwidth under worst-case conditions. Here are performance
parameters for specialty DRAMs (see alsotable ):
Page-hit latency. Most DRAMs provide optimum initial latency
by caching a complete page of memory in the sense amplifiers of the DRAM.
This parameter is a measure of the random-access speed of data within a
page.
Page-miss latency. When a DRAM operates in page mode,
the page must remain open until it is determined that an access has missed
the current page. When a miss is detected, it is necessary to pre-charge
the DRAM and then perform a complete row access to the new page. Some new
specialty DRAMs reduce this latency by hiding precharge during burst transfers.
Row-access latency. If a DRAM operates with the page closed
after each cycle or burst, then a complete random access must be performed
at the beginning of each cycle. This method of DRAM access may be preferable
to page mode if the system has a low page hit rate and frequent random
accesses.
I/O data rate. This parameter is the data rate of a single
I/O pin.
Peak bandwidth. This parameter is the amount of data transferred
at maximum I/O data rate for a given memory bus configuration. Bandwidth
is typically calculated for the bus size of a typical system, such as a
64-bit microprocessor bus. Peak bandwidth ignores the random-access time
necessary to fetch initial data from the DRAM.
Interleaved bank bandwidth. This is the maximum bandwidth
for databursts with random access in alternate DRAM banks. Multibank architectures
such as SDRAM, SGRAM, RDRAM, MDRAM, and ESDRAM improve effective bandwidth
by hiding random-access latency to alternate banks during current data
bursts. This improvement is only valid for alternate bank accesses. Random
accesses to the same bank can not be hidden using multibank techniques.
Sustained bandwidth. This is a worst-case bandwidth calculation
assuming a totally random access to any bank of memory followed by a four-word
data burst at the maximum I/O rate. Because of the randomness and large
size of today's programs and databases, this may be the best measure of
actual system performance.
Specialty DRAM Performance
Memory Type | Page-Hit | Page-Miss | Row-Access | I/O Data Rate | Peak Bandwidth2 | Interleaved | Sustained |
Latency (ns) | Latency1 (ns) | Latency (ns) | (MHz) | Bank Bandwidth2 | Bandwidth2 | ||
(Mbytes/s) | (Mbytes/s) | (Mbytes/s) | |||||
Fast-page mode | 30 | 110 | 60 | 25 | 200 | N/A | 139 |
CDRAM3 | 10 | 70 | 70 | 100 | 800 | N/A | 320 |
EDRAM3 | 10 | 75 | 25 | 100 | 800 | N/A | 582 |
EDO | 30 | 90 | 60 | 40 | 320 | N/A | 194 |
Fast EDO | 20 | 60 | 35 | 66 | 533 | N/A | 305 |
Burst EDO | 25 | 90 | 53 | 66 | 533 | N/A | 237 |
SDRAM3 | 30 | 70 | 60 | 100 | 800 | 800 | 320 |
SGRAM | 30 | 90 | 60 | 100 | 800 | 800 | 267 |
RDRAM4 | 39 | 99 | 73 | 600 | 1,200 | 1,077 | 285 |
MDRAM | 24 | 90 | 64 | 200 | 1,600 | 1,600 | 305 |
Multibank EDO EDRAM | 12 | 30 | 25 | 100 | 800 | N/A | 582 |
Multibank burst-EDO EDRAM | 12 | 30 | 30 | 100 | 800 | N/A | 533 |
ESDRAM5 | 12 | 12 | 30 | 133 | 1,064 | 1,064 | 853 |
1 Page-miss latency is time to access next random
data after data burst.2 Bandwidth assumes 64-bit memory
bus, except for RDRAM.
3 Page-miss latency reduced by hidden precharge.4
Bandwidth for two parallel Rambus channels.5 Page-miss
latency reduced by hidden precharge and random access.
Products from the following companies are mentioned in this
article:
Enhanced Memory Systems
Colorado Springs, CO
Hotline 800-545-DRAM
Fax 719-488-9095
http://www.csn.net/ramtron/enhanced
Hitachi America
Brisbane, CA
Literature 800-285-1601, ext. 12
Fax 303-297-0447
http://www.hitachi.com
Hyundai Electronics America
San Jose, CA
http://www.hea.com/products
Micron Technology
Boise, ID
Lynnette Pixley 208-368-4400
Fax-on-demand 800-239-0337
http://www.micron.com
Mitsubishi Electronics
Sunnyvale, CA
Sherry Hill 408-774-3188
Mosel-Vitelic
San Jose, CA
Esther Hsieh 408-433-6025
http://www.moselvitelic.com
MoSys
San Jose, CA
Gary Banta 408-321-0777
Fax 408-321-0780
http://www.mosysinc.com
NEC
Santa Clara, CA
Literature Hotline 800-366-9782
Fax 800-729-9288
http://www.nec.com
Rambus
Mountain View, CA
Julia Cates 415-903-4725
Samsung Semiconductor
San Jose, CA
Vera Haire 408-954-7228
http://www.samsung.com
SCIzzL (SCI/SLDRAM)
Los Altos, CA
Dr. David Gustavson
Silicon Magic
Cupertino, CA
Angelo Matthews 408-366-8888
Texas Instruments
Dallas, TX
Semiconductor Group (SC-97002)
800-477-8924, ext. 4500
http://www.ti.com
Advertisement