Advertisement

Data/fax modem ICs add voice

SIERRA.JAN–Sierra Semiconductor–rp

Data/fax modem ICs add voice

Voice-capable modems offer various design compromises and capabilities

BY WARD PITKIN
Sierra Semiconductor Corp.
San Jose, CA

Data/fax/voice modem chipsets are available from several companies, including Cirrus Logic, Inc., EXAR Corp., Phylon, Inc., and Sierra Semiconductor Corp. Choosing the best such modem for an application requires an understanding of the characteristics and capabilities of these chipsets.
The design of a data/fax/voice system, such as voice mail, involves a trade-off between system cost and the voice feature set of the mixed-signal chipset. There is no absolute definition describing the features a voice modem design must include to provide voice capability (see box, “A modem's voice”).
Typically, voice is added to a pre-existing data/fax chipset, and the resources available may be defined by the maximum data throughput available with that solution. For example, a V.22bis modem chipset uses a less powerful digital signal processor (DSP) than a V.32bis chipset. This may limit the voice feature set that can be implemented.
A data/fax/voice chipset must incorporate an analog front end (AFE), a DSP, and a controller. If no external memory is used in the system, then at least 100 bytes of RAM must be made available inside the chipset for voice buffering.
The AFE contains the A/D and D/A converter circuitry. It may also incorporate some of the call-progress tone filters as well as filtering required by FCC Part 68. When designing the chipset, the chip's designer must choose whether the tone filtering is done in analog in the front end or digitally in the DSP. Many solutions involve a balancing act and distribute the processing between the two to achieve the most cost-effective solution in the smallest space.
Adding voice to a chipset can require higher sampling rates from the AFE. For example, Microsoft and Compaq have defined a subset of wave files known as Business Audio. Business Audio-compatible files are sampled at 11,025 samples/s, without compression or coding, in mono 8-bit binary offset format. To fully implement such a sampling rate requires not only an extension to the A/D and D/A converter circuitry, but also an expanded filtering arrangement.
The primary tasks of the DSP in voice mode are to detect dual-tone multifrequency (DTMF) and to provide encoding and decoding for voice compression. Touch Tone telephones produce tone pairs that represent row and column addresses of the Touch Tone pad. Four rows and columns are defined giving 16 possible alphanumeric characters: 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, A, B, C, D, #, and *.
The DSP detects tone pairs in the presence of voice energy. Avoiding false DTMF detection is paramount. In fact, much effort has been expended to develop tests for “talk-off” performance–the ability of the DTMF detector to resist detecting voice as DTMF. Two commonly used tests for measuring talk-off are the Mitel talk tape and the Bellcore TR-TSY-000763 talk tape. Both tapes contain voice energy, and each test specifies a maximum number of false detections allowed.

Voice compression
How to choose appropriate voice compression for a given data/fax/voice modem is a hotly contended issue. The TR29.2 standards committee has identified 17 criteria for choosing voice compression, but the choice for the silicon manufacturer may prove to be much more straightforward. The trade-off is between voice quality and DTE-DCE data rate on one side, and required DSP processing power on the other, where DCE is the modem and DTE is the host PC.
The telecommunications industry identifies three levels of voice quality: toll, communication, and synthetic. Toll-quality voice is voice reproduced without obvious signs of compression. Communication quality voice typically includes white background noise, but little noticeable distortion. Synthetic voice has a machine quality, with obvious distortion.
The lower the DTE-DCE data rate, the less strain on DTE-processing resources, and the smaller the message size on the DTE data-storage device. Thus the choice is made by deciding on the processing power of the DSP (effectively a cost decision on the silicon size) and the voice quality and data rate required. In practice, the voice must be at least communication quality, and the data rate must be limited to 19,200 bits/s, for operation under Microsoft Windows 3.1.
Available DSP horsepower is usually defined by the data capability of the modem. For V.22bis, as little as 2 MIPS may be available after DTMF detection; V.32bis may have as much as 10 MIPS available after DTMF detection. If both products are to use the same compression, the total usable DSP horsepower is 2 MIPS.
Continuously variable slope delta (CVSD) modulation is a voice compression algorithm that meets these criteria. It is also possible to find ADPCM algorithms that meet these criteria. However, ADPCM is an adaptive technique that does not recover well during fast-forward and rewind operations.

Interrupt latency
The controller in the chipset implements the voice extension to the standard AT command set and is responsible for the PC interface. One of the hottest topics in DCE controller interface design today is the issue of interrupt latency and data throughput.
Modems are traditionally interfaced through a serial port. External modems interface to the serial side of a PC COM port. Internal modems contain their own equivalent of a PC-compatible COM port and are addressed from the parallel side. In both cases, data throughput is limited to 115,200 bits/s on an IBM PC. This assumes DOS foreground operation; the situation under Windows 3.1 is dramatically worse with a guaranteed data rate of only 19,200 bits/s.
The slowness under Windows is due to interrupt latency–the time it takes all applications running under Windows to store their variables and hand over control to the COM routine. Maximum latencies of up to 400 ms have been reported during such operations as floppy-disk accesses.
Standard PC-compatible COM ports use 16450-type UARTs, which interrupt the PC once for every byte received or transmitted. The 16550 is the much-ballyhooed FIFO-enhanced version of the 16450. It uses a 16-byte FIFO to store several bytes before an interrupt is required. The PC application programs a “skid,” which specifies how much of the FIFO is filled before the UART generates an interrupt. The remaining FIFO RAM is filled during the interrupt response latency.
Choosing the skid ideally requires knowledge of the average and maximum interrupt latency times. Unfortunately these numbers are not guaranteed for Windows 3.1 and thus a heuristic approach is required. In practice, setting the maximum allowable skid of 14–that is, interrupting the PC after two FIFO bytes are filled–still proves inadequate to prevent loss of data.
With a data rate of 38,400 bits/s, the first byte lost as a result of under- or overrun occurs during the first latency greater than 6.15 ms. During record operations, this results in loss of data through overrun. This loss is perceived on playback as missing syllables and words. Playback operations that suffer 16550 underrun sound as though silent gaps have been inserted in the recorded speech.
To avoid the hazards of 16550 operation, it is recommended that a DMA interface be used. A DMA interface interrupts the PC only once for every 4,096 bytes transferred. This gives the PC more time to process the foreground application, allowing it to run more smoothly and assuring no over or underruns.
The DMA interface takes advantage of the DMA controller hardware that exists in every PC. The DMA controller is a microprocessor in its own right and has the ability to move data to and from memory and I/O devices when the microprocessor is between memory cycles. DMA interfaces can be implemented for less than half the price of 16550 interfaces and they provide many times the performance. DMA allows an effective data rate of 11 Mbits/s under Microsoft Windows 3.1 compared to the 16550's possible 38.4 kbits/s.

MCI Wave-capable modems
One of the most exciting new developments in voice modem design is the advent of the Windows 3.1 MCI (media control interface) Wave-capable modem. An example of an MCI Wave-capable data/fax/voice chipset is the Sierra Semiconductor ST4743. With a DMA interface and Microsoft Windows MCI Wave-input and Wave-output drivers, such a modem can act as a PC sound board.
MCI-compatible presentation software like MediaBlitz, multimedia tools like Monologue text-to-speech, and Windows accessories Sound Recorder and Media Player are all examples of multimedia software that work with MCI Wave-capable modems. Object, link, embed (OLE) operations including the embedding of sound and multimedia presentations in documents and spreadsheets can also be carried out with these modems. Be wary of modems proclaiming Wave File, or Business Audio compatibility only. These modems cannot act as sound boards and do not interface to third-party multimedia software.

BOX:

A modem's voice

“Voice,” in the context of data communications is defined–in the most limited sense–as the capability of a modem to implement voice mail. Voice mail is the ability to record and play back messages from and to a phone line, and the ability to detect dual-tone multifrequency (DTMF)–that is, Touch Tone. Basic public-switched telephone network (PSTN) interface circuitry and systems requirements for voice are the same as those required for a data/fax solution.
While it is possible to implement voice mail using only DTMF detection, recording, and playback capability, most data/fax/voice solutions integrate extra features. These features may include Caller ID detection, IS-101 event detection, telephone application program interface (TAPI), and voice compression.

Caller ID. Currently available in 38 states, Caller ID provides the phone number of the calling party to the receiving party before the receiving party is required to answer the phone, allowing the receiving party to decide whether or not to answer. Several service levels of Caller ID are available, including single and multiple message format. In single-message format, the time, date, and calling number are made available. In multiple-message format, the name of the calling party is also provided.

IS-101. IS-101 is an interim standard for voice in modems offered by the TIA (Telecommunications Industry Association). In fact, the standard is part of the TR29.2 Facsimile DTE-DCE interface standards effort–the same effort that has produced the Class 1 and Class 2 Fax standards.
IS-101 specifies a set of event reports that a DCE (modem) might provide to a DTE (host PC). The host may ask which of the total set of services are actually available from a particular modem. Many of the event detections are optional and the technique by which the modem detects each of these events is not specified.
Examples of IS-101 events range from the simple to the complex, and include ring, ring-back, dial tone, and silence reporting, as well as the more exotic Bong tone (the tone you hear when you use your calling card) and SIT tone (the three ascending tones that indicate a mistake in dialing) reporting.
IS-101 has a defined life of only one year, beginning from September 1993. It is due to be replaced by a true standard when the project PN-3131 completes IS-101 by adding a standard voice compression. Meanwhile, each chipset manufacturer has created its own voice command set. In practice, these proprietary command sets implement a subset of IS-101.

TAPI. TAPI is part of the Microsoft At Work program and is intended to allow Microsoft Windows to act as an interface to the telephone network. It provides a “feature phone” type interface, linking databases and documents to graphically represented speed and repeat dialers, and gives the user easy access to Caller ID, Centrex control, and other telephone service provider amenities.

OVERLINE:

Data/fax/voice modem ICs

CAPTIONS:

For opening photo:

Modems that integrate high-speed data, facsimile, and voice mail capabilities onto low-cost personal computer add-in boards are now made possible through DSP-based chipsets.

For diagram:

This diagram illustrates the typical functions that may be incorporated in a voice-capable modem operating under Microsoft Windows 3.1.

Advertisement

Leave a Reply