Software development for today’s embedded microcontrollers can be a daunting task without adequate development tools, and compilation tools are a critical component. Embedded software development for microcontrollers is distinctly challenging due to the lack of pre-existing software infrastructure, such as a robust operating system that abstracts away the details of the hardware. Software development for these targets often takes place ‘close to the metal,’ meaning close interaction between the developer and the machine on which the software will execute. An ideal compilation toolchain, typically consisting of a compiler, linker, assembler, and libraries, can make embedded software development tasks simpler and shorten development time. This article will provide an overview of features that you may want to consider when determining the ideal compilation toolchain for your next microcontroller project.
One of the most important aspects of a compilation toolchain is how well it supports the device which you are targeting. Embedded microcontrollers vary widely in terms of processor configuration and features. If your toolchain is not optimized for your target, you may not be getting the maximum value out of your device, or your toolchain. Here are a few examples:
- Support for coprocessor features of your device. If your compiler has no concept of a particular hardware accelerator and therefore doesn’t generate code that takes advantage of a floating point unit or crypto accelerator, then the generated code may be far less efficient.
- Optimized instruction scheduling. This class of performance optimization maximizes instruction-level parallelism by making the most efficient use of the instruction pipeline and branch predictor of a given MCU.
- Handling of machine registers. A compiler will generate code which uses system registers to move data in and out of memory or to make calculations. If the compiler doesn’t consider the specific number and layout of the machine registers, it may generate inefficient code resulting in register values being shuffled back and forth to main memory (register spilling), causing significant impact on performance.
- Efficient use of cache prefetching. Some MCUs can speculatively prefetch code or data into the cache. A toolchain with knowledge of this hardware feature can improve your system performance. For example, an optimized runtime library may invoke the prefetch unit when moving blocks of data.
- Target-specific built-in functions. Often referred to as keywords or intrinsics, the compiler replaces these C-like instructions with device-specific instructions which make better use of the underlying hardware. For example, a compiler may support intrinsics for targeting the single instruction multiple data (SIMD) coprocessor within a microcontroller, making it easier for the programmer to take advantage of parallel operations without having to know the low-level details of the machine.
There are plenty of other examples of how a toolchain can take advantage of MCU hardware features. How does one tell whether a compiler is optimized for a particular target? Trawling through generated assembly code is one way, but this requires in-depth knowledge of the MCU architecture, and a lot of time. Benchmarks are another way to determine the efficiency of code generation for a particular target. Unfortunately many synthetic benchmarks, such as CoreMark or Dhrystone, are not broad enough to represent the wide range of software constructs that may show up in a typical embedded application. The best way to evaluate the efficiency of the compiler is to benchmark using your own code, or at least a representative example of your code. For example, if you are writing a signal-processing application, you may want to consider a benchmark which includes common algorithms such as FFTs or bilinear transformations.
There are additional toolchain features which can greatly reduce your software development time. Before I go into detail about those specific features, let’s look at the general topics of power and code size, two key factors when designing embedded systems.
Power
Embedded MCUs are often battery powered, so optimizing power consumption is an important aspect of system design. Generally speaking, static compilers don’t have the ability to predict power consumption of a system, making it difficult to employ optimizations that specifically reduce power in a deterministic way across various embedded designs. That doesn’t mean compilers are helpless. Simply reducing the number of instructions executed will generally reduce power consumption and this can be achieved with various optimization techniques, e.g. common sub-expression elimination or judicial application of loop unrolling. The faster your program executes, the longer it spends in a low-power idle state. Code size also plays a factor in power consumption. The smaller the compiled target image, the less memory required and therefore the less power needed to keep the memory energized during operation.
Size
Besides the impact on power consumption, memory is one of the key cost drivers for embedded microcontrollers. Although MCU prices have fallen dramatically, moving to a microcontroller with more on-chip flash memory can increase your BOM cost dramatically. For example, the unit price of the 64 Kbyte flash variant of a popular microcontroller is $2.80 whereas the 128 Kbyte variant is $4.88.
Oftentimes there is a trade-off between code size and performance and many compilers simply default to performance over code size, opting to inline functions or unroll loops at every opportunity. For example, function inlining replaces function calls with the body of a function, removing the overhead of saving and restoring information on the stack. Function inlining can have a dramatic impact on code size as the function body gets replicated for each function call, however, for smaller functions inlining can save stack frame space and the call/return overhead. A good compiler for embedded applications will do this much more pragmatically, using heuristics to determine when and where to inline functions to give maximum performance with minimal code size increase.
Let’s now have a closer look at additional toolchain features which can greatly reduce software development time.
- Linker support for precise code and data placement. Embedded systems oftentimes have a variety of memory types, such as ROM, DRAM, or SRAM. To maximize system resources, the embedded developer may want to precisely place code and data within the different memory locations. For example, the developer may want to put ‘hot’ code in faster RAM. An ideal embedded development toolchain might have a programmable linker which precisely controls the location of code and data on the target. Figure 1 is an illustration of how a linker can be programmed to precisely control placement of code and data within memory.
Figure 1: Components and organization of a “scatter file” for Keil MDK-ARM.
- Support for building position independent code. Most MCUs don’t have a dedicated memory management unit (MMU), therefore developers may need to design code that can be moved around by the operating system without stomping on other applications in the same address space. This is simplified with a toolchain which automatically replaces absolute addresses with relative addresses so the code can be executed from any location within physical memory.
- Size optimized libraries. Some toolchains come with size-optimized runtime libraries. By excluding unneeded portions of the standard C library these ‘microlibs’ can significantly reduce the image size of your embedded application thus preserving precious system memory.
- Runtime startup code. C runtime startup code handles functions such as allocating space for the stack and heap and zeroing uninitialized data. The developer of course could write this from scratch, but time is saved when it is provided as part of the toolchain.
- Support for assembly inlining. This feature enables the developer to embed assembly instructions within C source code, providing more fine-grained control over machine execution and potentially improving software execution in certain cases. The compiler must understand machine-specific assembly syntax mixed with the high level C/C++ source code.
- Diagnostic capabilities. The compiler front-end should emit warnings or fail to compile when presented with invalid or incorrect source code sequences. By identifying mistakes like divide by 0 or arithmetic operations on incompatible data types, error and warning messages from the toolchain can save the developer a lot of troubleshooting time. Equally important is how well the tool communicates to the user where the erroneous input has occurred. Some compilers simply indicate the line number where an error has occurred, but if that line contains a complex expression, the developer may still have little clue about the root cause of the error. An ideal compiler will indicate line and column information along with an accurate description of the infraction.
- Support for safety-related software development. As MCUs are finding their way into more safety-related applications, the number of software programmable devices in automobiles, medical equipment, and industrial equipment are growing dramatically. The code generator is a critical component and can compromise the safety of the entire system. The ideal compiler in this case should be mature, stable, and predictable. The toolchain vendor might provide evidentiary information about the robustness of the toolchain such as defect reports, test reports, and a safety manual describing how to use the toolchain in the most deterministic way. Alternatively, some vendors may provide pre-certified tools based on a particular safety standard.
Support for your target architecture and specific product features are just a subset of factors which make up an ideal compilation toolchain for embedded software development. Correctness of translated code, quality of debug information, ease of use, timely technical support, clear documentation, and product stability will all have an impact on the time required to complete your project.
The ARM Compiler has been a leader in the embedded market more than 20 years and represents a state-of-the-art compiler for microcontroller software development, supporting all ARM-based devices. The ARM Compiler is a component of both the Keil Microcontroller Development Kit (MDK) and ARM Development Studio 5 (DS-5) and has been used to build software running on literally billions of embedded devices.
Learn more about ARM