Advertisement

Designer’s Guide: Safety-critical processors

When the software controlling a dangerous system suffers a glitch, you’ll need the right type of processor to avoid a potentially fatal failure.

Editor's Note:  Welcome to AspenCore's Special Project on the safety of autonomous vehicles. This article, along with the articles listed on the last page, form an in-depth look from a variety of angles at the business and technology of autonomous vehicle safety. 

Special Project Logo - 1000 px

By Richard Quinnell, Special Projects editor

Many system designs, including industrial machinery, medical devices, and automobiles, are safety-critical and need to have an ability to detect their own operational failures in real-time and react in a way to avoid harming the people using them. Creating a processor-based system to provide this functional safety thus requires using a combination of hardware error-checking, hardware self-test, and system redundancy to provide the software-independent fault detection and safe resolution these systems need. Fortunately, there are processors available that handle much of the hardware heavy lifting needed for safety critical systems.

The need for functional safety in processor-based systems is rising, especially in automotive applications. Even setting aside the whole movement toward autonomous vehicles, automobiles are increasingly reliant on microprocessors in implementing critical functions. Anti-lock braking systems, engine control, and steering are simply a few of the vehicle functions now under processor control that have major safety implications. Should any of these processors make even a single misstep without being caught, the results could be fatal.

Unfortunately, the opportunities for something to go wrong in a processor-based design are legion. As the diagram below shows, proper code execution requires many system elements to work correctly. The processor and all its internal registers, the program and cache memories, the RAM, and the bus interfaces among them, along with the system power and clocks, all must operate flawlessly with precision timing. But as anyone who has had their computer lock up for no apparent reason knows, a single bit change anywhere in this system can derail the entire operation. A noise glitch on any line of the bus, a stray alpha particle or cosmic ray strike (yes, they do happen, and more often than one might think) that alters a bit in memory or a register, low voltage, clock drift, and a host of other sources can cause the system to stumble.

safety processors - fig 1 basic-processing-unit

The core of processor-based systems offers many opportunities for noise glitches and other single event upsets to completely derail proper software execution.

Such errors can be made unlikely through careful design, but not eliminated. For a system to be deemed safe, then, it must be able to detect such an error in real time and respond appropriately to mitigate its effects. What constitutes proper mitigation is highly application dependent, but the methods for detecting an error are well-established and common to safety critical designs. Transactions on the system bus, for instance, can be monitored by including error correction coding (ECC) or cyclic redundancy check (CRC) data with each transaction. Voltage monitors can keep tabs on power sources, and watchdog timers can help monitor clock signals.

A watchdog timer can also provide a gross indication of proper processor operation by having the processor reset the timer on a regular basis. If the processor fails in that duty, the watchdog sends a signal to alert the system to the failure once the timer has run out. This involves making a tradeoff between the software overhead of frequent timer resets and the delay in signaling processor failure, however.

Yet, detecting a failure is only one part of functional safety. The other part is responding to the failure in a way that maintains safe system operation. This response cannot be entirely software based. You cannot count on being able to use a processor that has failed to mitigate its own problems or even react to the alerts. There must be an independent hardware mechanism in place.

A variety of architectures have evolved over the years to provide such an independent mechanism in processor-based systems. These architectures include the use of a single processor with hardware checker and the use of two processors with the second processor of the same or different type as the main unit. This second processor can operate independently, running the same or independent software, serving as a touchstone to validate the main processor's behavior on a cycle-by-cycle basis. The more popular alternative, though, is for the second processor to run in lockstep with the main unit, using the same code and data. However, the secondary processor will typically work on a slight delay from the primary, to avoid having both processors affected by a transient error on the system bus.

safety processors - fig 2 safety processor architectures

A variety of architectures have been developed that support the detection and mitigation of random processing errors. (Source: EE Times)

What these architectures have in common is a need to make substantial additions to the basic processor design, including comparison hardware and possibly a full secondary processor. The advent of multi-core processors opened an opportunity for silicon vendors to offload much of this hardware design burden from system developers, and many have stepped up to the plate by introducing processors specifically designed for safety-critical applications. Many of these safety processors are marketed primarily to automotive designers working under the ISO 26262 standard for ASIL (automotive safety integrity level) certification, but are equally applicable to other safety-critical applications in industrial control, medical, military, and aerospace.

These providers go further than simply providing hardware features. They also offer designers assistance in implementing safe designs, traceability and verification documentation and development tools in support of obtaining safety certification, and diagnostic software libraries.

Here are some representative safety processor families currently on the market:

  • ARM Cortex R52: Part of ARM's v8-R architecture, the R52 core gives ARM licensees the foundation features needed to implement a safety processor. The dual-core device can operate in lockstep mode for fault detection and has the option of an additional split configuration that allows the two cores to operate independently when needed. The core design also includes ECC on all bus and memory interfaces, capable of double-bit error detection and single-bit error correction. In addition, the core also offers high-coverage built-in self-test (BIST) capability and a licensable safety package to simplify product safety implementation.
  • Infineon Aurix: Containing up to three independent cores, Aurix family devices provide dual-lockstep processors implemented with additional architectural diversity. The two cores run the same code but have hardware design differences that aim to reduce the opportunity for common cause errors to arise. The design differences help ensure that an event that creates an error on the main processor will not cause the same error on the comparison processor.

    safety processors - infineon lockstep CPU architectureThe Aurix lock-step processor design from Infineon uses delayed execution of a common instruction and data stream to avoid having single-event upsets go undetected. (Source: Infineon)

  • Intel Xeon D-1529: Instead of targeting automotive applications, Intel's D-1529 aims to meet industrial needs under IEC 61508 safety integration level (SIL) certification standards. The design includes redundant lockstep processor pairs, windowed watchdog timers, clock and power monitors, and processor temperature monitoring. The processors can support mixed safety-critical and non-critical task execution and offers diagnostic and error-detection logic on its PCI and SATA interfaces.
  • MIPS i6500-F core: This core design allows MIPS licensees to create safety processors based on configurable clusters of 64-bit CPUs. It includes parity checks on all buses, ECC on its RAM, and logic BIST support. It has been certified as a safety element out of context (SEooC) to ASIL level B, supporting designs aiming for certification as ASIL level D.
  • NXP S32S24: Targeting ASIL-D designs, the S32S247 uses four ARM R-52 lockstep cores with a hardware hypervisor to keep application program execution separate. The large (to 64 Mbytes) integrated Flash memory allows the processor to hold multiple sets of application code in support of over-the-air software updates, and all memory interfaces include ECC.
  • STMicro SPC5: The SPC5 product line includes several variations, including lockstep, delayed lockstep, and decoupled parallel processing options. Processors include BIST hardware with the SPC57S line additionally offering ECC on memory.
  • Texas Instruments Hercules: The Hercules family of safety processors have been certified compliant under IEC-61508 SIL level 3 and ISO-26262 ASIL level D using lockstep Cortex-R processors. In addition, they offer ECC on system memory, ECC or parity on select peripheral and DMA interfaces, CRC or parity on serial and network communications peripherals, on-chip clock and voltage monitoring, IO loopback and ADC self-test, and memory BIST. The error signaling module offers an external signal pin to facilitate additional system response to errors detected within the processor.
  • Xilinx Zynq 7000: While it is not actually a processor, the Zynq FPGA can be configured to provide two independent safety channels in a single device using design packages, methodologies, and tools certified for use in functional safety applications. The tools include support for isolated design flows that physically separate the redundant elements to prevent the use of FPGA resources, and the availability of soft error mitigation IP.

Choosing a safety processor is only the beginning, however. Developers of safety-critical systems will still need to adopt a design and evaluation methodology for both hardware and software that rigorously evaluates the potential for errors to occur and validates the system design for resilience to such errors. Safety-targeted processors and the support their vendors provide, though, go a long way toward easing that developer burden.

Check out these other stories in the safety of autonomous vehicles Special Project:

Autonomous vehicles: The electronics road to making them safe
Explore tools and technologies available to make AVs safe, including pedestrian path prediction, functional safety, cameras/lidars/radars, and V2X.

How Are We Going to Monitor Drivers?
Euro NCAP wants Driver Monitoring Systems (DMS) as a primary safety standard by 2020. Meanwhile, recent Uber and Tesla crashes substantially heightened the importance of DMS. Now car OEMs are scrambling.

Uber Fatality Sends AVs Back to Safety 101

An NTSB preliminary report exposes two issues. One is the immaturity of Uber’s AV software stack. Another is the absence of an Uber safety strategy in creating its AV testing platform.

Robocar Testing: It's Simulation, Stupid!
Why do you need simulation? It’s because you can still miss the ground truth even with millions of miles. 

Advertisement



Learn more about Electronic Products Magazine

Leave a Reply