Advertisement

High-performance DRAMs

RAMTRON.JUN–Ramtron–rm

High-performance DRAMs

Manufacturers have developed several techniques to keep DRAMs in step
with ever-faster microprocessors

BY DAVID W. BONDURANT Ramtron International Corp. Colorado Springs, CO

Semiconductor makers have introduced several high-performance DRAM
products in the last year to bridge the growing speed gap between
processors and the DRAM main memory. Microprocessor clock rates have
increased dramatically over the last two years. The typical 32-bit
microprocessor now runs at clock rates of 25 to 33 MHz, and
high-performance RISC microprocessors have clock rates ranging from 100 to
200 MHz. At these clock rates, processors demand raw memory bandwidths of
100 to 1,600 Mbytes/s and random-access latencies of 5 to 40 ns. The
conventional dynamic RAM, with its 70-ns random-access time and its
100-Mbyte/s 32-bit bandwidth, cannot support these processors without
seriously degrading performance.
To address these mismatches, several DRAM products announced in the last
year integrate one or more advanced memory architecture techniques on
chip. In addition, one of the new products uses an advanced DRAM process
that reduces the basic access and cycle times of the DRAM significantly.
The new DRAM products are: * The enhanced DRAM (EDRAM) * The cached DRAM
(CDRAM) * The synchronous DRAM (SDRAM) * The Rambus DRAM (RDRAM)

Enhanced DRAM The enhanced DRAM (EDRAM), introduced by Ramtron
International Corp. (Colorado Springs, CO), retains the asynchronous
operation, +5-V power supply, CMOS or TTL I/O levels, and SOJ or SIMM
packaging of the standard DRAM. Its performance is significantly increased
because of an improved DRAM process and architectural techniques. The
EDRAM is available in 4-M x 1- and 1 M x 4-bit organizations with
write-per-bit capability. It is packaged in a standard 28-pin SOJ package
and in upward-compatible 1-M x 32- and 1-M x 36-bit SIMM modules (see Fig.
1). The EDRAM's 0.76-micron CMOS process features a patented field-shield
isolation technique to reduce parasitic capacitance and increase
transistor gain. As a result, the EDRAM has a DRAM row-access time of 35
ns, read/write-cycle time of 65 ns, and page-write cycle time of 15 ns.
The EDRAM also adds an on-chip static RAM cache, write posting register,
and additional control lines to allow the SRAM cache and DRAM to operate
independently. The 2-Kbit 15-ns cache is integrated directly into the
column decoder of the DRAM array. It directly caches one row at a time.
This allows the EDRAM to operate like a standard DRAM with either
page-mode or static-column-mode access to any location within a row in 15
ns (2.7 times faster than 70-ns standard DRAMs). On a cache miss, the
EDRAM loads a new row into the cache and outputs the selected location in
35 ns. The EDRAM can access data within a page in either page mode or
static-column mode. It has separate output enable (/G) and column address
latch (/CAL) to replace /CAS in a normal DRAM. /CAL can either be held
high to allow static column access in 15 ns from an address change or be
clocked to latch the address if the address is not stable throughout the
cycle. /G independently gates the read data to the output pins. The EDRAM
can hide DRAM precharge cycles during burst reads from cache. Its separate
chip select (/S) and row enable (/RE) replace /RAS in a normal DRAM. As a
result, /RE need not be enabled to read cache (cache hits) and /RE can be
brought high to precharge the DRAM array as soon as new data are loaded
from DRAM into SRAM cache on a cache miss cycle. During burst reads, the
25-ns precharge cycle will complete before the next burst memory access.
Conventional DRAMs in page mode must perform both precharge and row access
on a page miss resulting in a 130-ns page-miss access time compared to the
EDRAM's 35-ns cache miss access time. The EDRAM has a refresh pin (/F)
that allows refresh cycles during reads from cache. In addition, both
selected and unselected memory banks can refresh at the same time. This
allows refresh cycles during burst read cycles, entailing no wait states
for refreshing. A standard DRAM would require a CAS before RAS (CBR)
refresh cycle that could not be hidden and would lose the current page
contents of the refreshed banks. This would force page misses following a
refresh. EDRAM eliminates refresh wait states and maintains cache contents
after refresh cycles. During write cycles, the EDRAM writes directly to
the DRAM. An on-chip last-row-read register (LRR) and hit/miss comparator
compares the write row address to the last row read (cache address). If a
write hit occurs, the EDRAM also updates the selected location in the SRAM
cache to maintain coherency. The EDRAM is capable of posting the first
write in a page in 15 ns using the /CAL input to latch address and /WE to
latch data. During burst write operations, the EDRAM can achieve a write
cycle time of 15 ns per word. At the end of a write cycle, it is possible
to read a cache in parallel with write precharge. Ramtron plans to offer
a 512-K x 8-bit version of the EDRAM during 1993.

Cached DRAM Mitsubishi Electronics America (Sunnyvale, CA) will
introduce a 4-Mbit cached DRAM (CDRAM) in the third quarter. The CDRAM
increases performance by integrating a 256-K x 16-bit DRAM array, a 1-K x
16-bit SRAM cache, and a synchronous control interface onto a single chip.
The CDRAM uses a +3.3-V power supply and LVTTL I/O levels. It comes in a
70-pin TSOP-II with 0.65-mm lead pitch (see Fig. 2). The CDRAM array is
organized as 256 K x 16 bits and has a 75-ns row-access time and 150-ns
read/write-cycle time. It has a dedicated 10-bit multiplex address input
bus. The DRAM array can be accessed directly from the I/O pins or can load
the cache via a buffer and 128-bit bus. The CDRAM's on-chip cache is
segmented into 64 cache lines with eight 16-bit words per line. The cache
is addressed by a dedicated 10-bit address bus. The cache can be either
direct mapped or set associative, depending on the implementation of the
external cache controller. During a read, data are available in 15 ns if a
cache hit is detected by the cache controller. If data are not in cache,
the controller will perform a DRAM reference and transfer a new line (128
bits) to cache and the output pins in 90 ns (six clock cycles). The
integrated 128-bit wide bus from DRAM to SRAM loads the entire cache line
in 90 ns. During write operations, the CDRAM cache can operate as either
a write-through or write-back cache, depending on the controller
implementation. In write-back mode, data are written to cache in 15 ns on
a write hit and data are not updated in the DRAM. The cache controller
must keep track of this incoherency. When the dirty cache line is
invalidated by the cache, a 280-ns write-back cycle must be performed to
store the cache line into DRAM while the next cache line is read into
cache. The 128-bit-wide data-transfer buffers allows the write portion of
the write-back cycle to be hidden in some cases. If the write cycle is to
a location not currently mapped to SRAM cache (write miss), a new line is
read to SRAM and data are written to the new line in 105 ns. The CDRAM
uses a synchronous clock to control all operations. All control and
address signals are set up before the synchronous clock edges and access
times are measured from the synchronous clock. During burst read or write
operations from cache, new data can be transferred every 15 ns for a
maximum burst data rate of 267 Mbytes/s in a 32-bit memory using two
CDRAMs. The CDRAM has a write mask register allowing write-per-bit
transfers to DRAM. It also has a synchronous output register that can
operate in transparent, latched, and registered modes. Clock mask inputs
allow the clock to be disabled to reduce power. Mitsubishi also plans to
offer a 4-M x 4-bit CDRAM.

Synchronous DRAM The synchronous DRAM (SDRAM) is a proposed enhancement
to the JEDEC standard DRAM that provides a higher level of performance by
operating the control interfaces from a synchronous clock and by
implementing on-chip interleaving and burst-mode address generation (see
Fig. 3). The SDRAM uses synchronous control at clock rates up to 66 or
100 MHz. A clock enable input disables the clock so that the chip can
enter a low-power mode. The SDRAM is controlled with the same /RAS, /CAS,
and /WE signals used by a standard DRAM. A new DQM input reads and writes
data. A new /CS input enables command execution and allows full-page
bursts to be suspended. The SDRAM has a programmable mode register that
can be loaded when /RAS, /CAS, and /WE are low on the same clock edge.
This register specifies the number of clock cycles (1, 2, or 3) between
the column address and the first data word and specifies the burst size
(1, 2, 4, 8, and full page) and mode (Intel interleave or linear). It is
necessary to execute a memory cycle to switch between single word and
burst modes and between Intel and linear modes. The SDRAM operates in a
page-caching mode if the row stays enabled after the initial access. In
this mode, data within a page are available randomly with a 30-ns
page-access time. On a page miss, a new data page is accessed in 90 ns. The
SDRAM can do a page-mode write-hit cycle in 15 ns. On a page miss, it has
to precharge and access the next row. The write-miss speed is 75 ns. The
SDRAM is capable of read or write data bursts at the synchronous clock
rate (up to 66 or 100 MHz) after the initial latency has been met. This is
possible because of on-chip interleaving and pipelining. The SDRAM can
burst 267 Mbytes/s for 66-MHz parts and 400 Mbytes/s for 100-MHz parts
(for a 32-bit data-word configuration). Two different versions of the
SDRAM have been proposed and are now being sampled. Both products have
16-Mbit density, operate from a +3.3-V power supply, and use LVTTL I/O
levels. Samsung Semiconductor (San Jose, CA) has a 2-M x 8-bit SDRAM in a
32-pin SOJ package that uses a level-sensitive RAS and a single-bank
architecture. In this approach, the /RAS line must be held for the
duration of active and precharge cycles rather than pulsed for one cycle.
This makes the /RAS similar to a standard DRAM. Samsung uses a single-bank
architecture to simplify the design and operation of the chip. Since only
one bank of memory on the SDRAM can be active at one time, the chip has a
less complex control sequence and will use less power than other proposed
SDRAMs. Samsung also believes that this approach will cost less to
manufacture.
The proposed JEDEC standard 16-Mbit SDRAM has edge sensitive RAS and a
dual bank architecture. This approach, advocated by NEC Electronics, Inc.
(Mountain View, CA) and several other DRAM vendors, allows simultaneous
operation on a second memory bank on the chip during first-bank access.
This will allow data prefetch from the second bank to overlap with the
current read or write operation in some cases. The current NEC SDRAM
product comes in a 44-pin TSOP-II with 0.8-mm pitch. Samsung and NEC are
sampling 2-M x 8-bit versions of the SDRAM with production expected in
late 1993. Other DRAM vendors such as Mitsubishi, Fujitsu
Microelectronics, Hitachi, Oki Semiconductor, Texas Instruments, and
Micron Semiconductor plan compatible SDRAM products during 1994.

Rambus DRAM Rambus Inc. (Mountain View, CA) has developed and licensed a
new DRAM architecture and technology to DRAM manufacturers NEC, Fujitsu,
and Toshiba. The Rambus DRAM (RDRAM) uses a unique byte-wide multiplexed
control, address, data bus (Rambus Channel) using low-voltage-terminated
transmission lines operating at a synchronous clock rate of 250 MHz to
transfer data between the RDRAM to the controller. This bus is capable of
burst transfers at 500 Mbytes/s because data are clocked on both clock
edges. The Rambus concept is unique because each chip has its own
built-in controller that handles address decoding and page cache
management. As a result, a 2-Mbyte memory subsystem can be built with just
one chip and the bus controller. Additional memory can be added to the
system by adding chips to the bus. Memories of different densities can be
mixed on the same bus (see Fig. 4). RDRAMs uses a unique 32-pin
surface-mount, vertical package to achieve a high packing density. Rambus
has also developed a module socket with Augat that allows bus expansion.
A typical RDRAM product such as the NEC microprocessorsD488170 is
organized as 2 M x 9 bits with two active memory banks each with a page
cache of 2 Kbytes (16 Kbits). Data are accessed 36 bits wide internally
and interleaved four ways on transfers to the bus. Parity is checked and
generated internally. The RDRAM operates from a +3-V power supply. Unlike
a typical DRAM, memory reads and writes include bus request, data, and
acknowledge packets. A read operation starts with a 6-byte-long (12-ns)
request packet that contains the op code, 36-bit address, burst length (1
to 256 bytes). After 16 ns, the RDRAM will respond with a positive
acknowledge if data are currently in its page cache or a negative
acknowledge if the data are not in cache or if a refresh cycle is in
progress. If data are in page cache, the RDRAM will begin supplying read
data to the bus 28 ns after the request packet. The effective page-hit
access time for the RDRAM is 40 ns (12 ns request, 28 ns access time). On
a page miss, the controller will wait 64 ns after the first negative
acknowledge to request data again. If data are available on the second
request, the RDRAM will respond with data after 28 ns. The effective
page-miss access time is 116 ns (12 ns request, 64 ns wait, 12 ns request,
28 ns access). During write cycles, the controller will place data on the
bus starting 4 ns after the request packet. If the RDRAM can accept the
data, it replies with a positive acknowledge after 16 ns. The effective
page write hit time is 28 ns (12 ns request, 16 ns acknowledge). If the
write data are to a different page or if a refresh cycle is in progress,
the RDRAM will respond with a negative acknowledge after 16 ns. The
controller will wait for 64 ns after the first request packet to issue
another request packet and write data. If the RDRAM acknowledges the
second packet, the effective write miss time is 104 ns (12 ns request, 64
ns wait, 12 ns request, 16 ns acknowledge.) The high bandwidth of the
RDRAM transfer swamps the initial overhead in long bursts of contiguous
data, just what happens in frame buffers. Accordingly, Brooktree Corp.
(San Diego) has just licensed Rambus technology. Brooktree is not
committed to any products using Rambus, but some of the company's current
products connect to frame buffers directly, so a change of memory
interface would not be a huge leap.

The Ramlink concept The Ramlink concept (see Fig. 5) is the work of the
IEEE Computer Society P1596.4 Working Group. Ramlink is similar to Rambus
in that it uses a byte-wide low-voltage-swing interface between chips and
operates at a 500-Mbyte/s data-transfer rate. Ramlink differs from Rambus
by employing differential low-voltage point-to-point interconnect buses
and a ring architecture rather than a bus architecture. Each controller
would control one or more ringlets and up to 60 slave memories. Bandwidth
could be increased by paralleling multiple ringlets per controller. Ramlink
appears to have the advantage of supporting larger physical memory sizes
than Rambus, since each memory is effectively a retransmitter. Rambus is
limited to about 10 cm of bus length without adding a similar bus
extender. On the other hand, each Ramlink chip has twice the interface
pins. Each Ramlink node will add about 6 ns of data-transfer latency
during transmissions. Therefore in larger systems, a significant amount of
data-access latency could be incurred. As with the Rambus, the key
advantage of Ramlink is high data-transfer bandwidth. The approach is best
used with systems that require large block transfers of information. The
Ramlink concept is not yet supported by semiconductor memory suppliers.
Hans Wiggers, chairman of the working group, doesn't expect the concept to
reach silicon until the 64-Mbit DRAM generation. He believes that its
primary application is likely to be large-scale computers like
supercomputers.

For more information on the various high-performance DRAMs, call the
companies or circle the appropriate reader service number:

Mitsubishi Electronics America, Inc. 1070 East Arques Avenue Sunnyvale,
CA 94086 408-730-5900

NEC Electronics, Inc. 401 Ellis Street PO Box 7241 Mountain View, CA
94039 415-960-6000

Rambus Inc. 2465 Latham Street Mountain View, CA 94040 415-903-3800

Ramtron International Corporation 1850 Ramtron Drive Colorado
Springs, CO 80921 800-545-3726

Samsung Semiconductor 3655 North First Street San Jose, CA 95134
408-954-7000

IEEE Computer Society P1596.4 Working Group Hans Wiggers 415-857-2433

OVERLINE:

High-performance DRAMs

CAPTIONS:

Fig. 1. The 2-Kbit cache of the EDRAM is integrated directly into the
column decoder of the DRAM array. It directly caches one row at a time.

Fig. 2. The cached DRAM integrates a 15-ns 1-K x 16-bit SRAM cache onto
the chip. The cache is segmented into 64 cache lines with eight 16-bit
words per line.

Fig. 3. The synchronous DRAM operates the controls from a synchronous
clock and implements on-chip interleaving and burst-mode address
generation. Otherwise it is similar to conventional DRAMs.

Fig. 4. The Rambus uses a dedicated 500-Mbyte/s bus to transfer data in
packets. The bus interface must be integrated on any chips that access the
memory.

Fig. 5. The proposed Ramlink, like Rambus, passes data at 500 Mbytes/s on
a byte-wide bus. However, since each memory retransmits the signal, much
larger arrays are possible.

Advertisement

Leave a Reply