Building advanced Cortex-M3 applications
By Jean J Labrosse, Lotta Frimanson and Anders Lundgren
(04/08/09, 12:36:00 H EDT)
The ARM Cortex-M3 architecture provides many improvements compared with its predecessor, the popular ARM7/9, and is designed to be particularly suitable for cost-sensitive embedded applications that require deterministic system behavior.
This article describes how developers can best utilize the advanced capabilities of the Cortex-M3 when designing embedded applications. Comparing ARM7/9 to Cortex-M3 A comparison of the main characteristics of Cortex-M3 with those of ARM7/9 is shown in Table 1 below.
The Cortex-M3 improves on the ARM7/9 in most qualitative estimates " simpler stack architecture, better interrupt controller, and higher-performance instruction set, as well as enhanced debug capabilities, all of which can significantly affect end-product performance. Stacking and Interrupts The task context is automatically saved on the process stack when an exception occurs, upon which the processor moves to Handler mode, making the main stack active. On return from the exception, the task context is restored and Thread mode re-instated if the interrupted task remains the active task. If, however, a new task is to be scheduled, the context switch must take place. Because the task context is already saved, this procedure is more straight- forward with the Cortex-M3 and also consumes 50% fewer processor cycles.
Migration between processors The NVIC saves half the processor registers automatically upon interrupt, restoring them upon exit, allowing for efficient interrupt handling. It also removes the need for saving/restoring registers during back-to-back interrupts. The NVIC also integrates the SysTick, a 24-bit down-counting timer intended for RTOS use. The NVIC and SysTick peripherals ease the migration between Cortex-M3 processors, particularly when an RTOS is used, as it simply requires a function that returns the clock frequency on which the SysTick timer is based. In contrast, an RTOS port to an ARM7TDMI-S processor would require a port to the interrupt controller of the processor and a port to a hardware timer in addition to the generic ARM port. The interrupt functionality, which must be written individually for each ARM7/9 port, is provided just once for all Cortex-M3 implementations. The sleep mode feature of the Cortex-M3 can be used to conserve power when the target application is idle. For example, with µC/OS-II the idle task calls an application-level hook that causes the processor to enter sleep mode until the next interrupt is received. Unlike most previous ARM processors, the Cortex-M3 also has a fixed memory map. Instruction set The Cortex-M3 includes 36 instructions not available on the ARM7/9, including CLZ (count leading zeros), which is particularly useful for kernel scheduling algorithms. An optimized version of the scheduling algorithm for µC/OS-II written in assembly language using the CLZ and RBIT instructions can be used to find the highest priority ready task efficiently within about 25 clock cycles " about twice as fast as the equivalent optimization that can be done with an ARM7/9 and, 3-4 times faster than the same algorithm written in C. Debug and trace In contrast the Cortex-M3, which contains a sub-set of the new ARM Coresight debug technology has 6 code breakpoints and 4 general-purpose watchpoints, providing enough breakpoints for most debugging scenarios. The Cortex-M3 also allows live access to the core when the application is running, making it possible to read and write memory and set/clear breakpoints on a running application.
Debug functional units
There are five main functional units that implement the Cortex-M3 debug logic (Figure 2, below):
* DWT (Data Watchpoint and Trace) " provides a set of functions that collect information from the system buses and generates events to the ITM/ETM units. These functions are: four independent watchpoints; program counter sampler; interrupt trace; and CPU statistics. * ETM (Embedded Trace Macrocell) " an optional unit that provides high bandwidth instruction trace data over a dedicated 4-bit high speed trace bus using a special hardware probe such as IAR J-Trace for Cortex-M3. * FPB (Flash Patch and Breakpoint) " implements the logic for 6 code breakpoints, and contains logic to patch 8 instructions. * ITM (Instrumentation Trace Macrocell) " the formatter for events originating from the DWT. ITM packets from 32 I/O ports can be picked up by the debugger in real time, and can transmit internal status and statistics on RTOS kernels, TCP/IP stacks and other middleware software * DAP (Debug Access Port) " receives data from the other units and combines and routes the information to the available debug ports, JTAG, SWD (Serial Wire Debug), SWO (Serial Wire Output), and trace port. SWD is the preferred debug interface when debugging with Cortex-M3, but to take full advantage of its features requires a probe with full SWD/SWO support such as IAR J-Link version 7 or later, which is capable of running at SWO speeds at up to 6Mb/s.
Using Cortex-M3 debug features The function profiler, shown in Figure 3 below, will help find the functions where most time is spent during execution " the parts to focus on when spending time and effort on optimizing code. In a system without trace capabilities, this would have required the debugger to set breakpoints at each entry and exit from functions.
DWT allows the debugger to sample the PC and provide statistical profiling. A PC sampling rate of around 10,000 samples per second is good enough to find CPU intensive functions in the application, although not to provide a precise profile. The timing information for each function in an application can then be displayed in different formats in IAR Embedded Workbench while the application is running. The statistical trace information received via SWO can also be used to provide information about the number of times each instruction in the code has been executed. For example, in IAR Embedded Workbench, the instruction profiling information is displayed in the Disassembly window (Figure 4 below) " the leftmost column shows the number of times each instruction has been executed.
The four watch points in the DWT module can be used to log accesses to up to four different memory locations or areas, including time information, and thus help to place that data in more efficient memory, making the application program more efficient and helping to debug it. By using the DWT module to trigger an ITM packet for each interrupt activity, the debugger can present logs and graphs of the interrupt activity in the system, helping for example to locate which interrupts can be fine-tuned to make execution faster. Figure 5 below shows the Interrupt Log window in IAR Embedded Workbench. A condensed summary for each interrupt source is also available.
The ITM module also offers a non-intrusive printf() function that reduces the overhead for this mechanism to around 100 microseconds compared to a couple of hundred ms when using the traditional breakpoint-driven semi hosting method. The increased number of breakpoints in the Cortex-M3 means data breakpoints " useful for tracking down bugs that involve corrupted variables or data " at the same time as code breakpoints are active. The DAP in Cortex-M3 allows for full access to the core buses during application execution, enabling the debugger to allow live memory reads and writes, and to implement live watch on application variables. IAR Embedded Workbench uses this feature in the Live Watch window and the Memory window Building applications Using off-the-shelf software components literally saves many man years of software development allowing a project to be completed in a matter of a few days. The Cortex-M3 processor has been specifically designed for cost sensitive embedded applications. The new features aim to make software on Cortex-M3s more efficient, and also make it easier to migrate from one controller to another, or to port an RTOS to a new platform. The instruction set includes helpful new instructions, such as CLZ, that can improve assembly for common algorithms and facilitate a sleep mode when the processor is idle, it can enter, to be awoken when an interrupt occurs. Finally, the debug controller makes developing and testing software easier. Not only is the architecture sensible, stable and efficient, its designers aimed provide a developer-friendly platform, with a sophisticated debug system and six flash breakpoints that are immensely helpful during testing and development, and powerful trace features that allow greater real-time visibility into application operation. Jean Labrosse is President of Micrium, a provider of high quality embedded software solutions. Mr. Labrosse is a regular speaker at the Embedded Systems Conferences and serves on the Advisory Board of the conference. Jean is the author of two books: MicroC/OS-II, The Real-Time Kernel and, Embedded Systems Building Blocks, Complete and Ready-to-Use Modules in C and has written numerous articles for magazines. He has an MSEE and has been designing embedded systems for many years. Anders Lundgren has been with IAR Systems since 1997. He currently works as product manager for the IAR Embedded Workbench for ARM. During the first years with IAR Systems he worked with compiler development and as project manager for compiler and debugger projects. Prior to joining IAR Systems Mr. Lundgren worked with space science instruments at the European Space Agency and spent one year at the space science laboratory at the University of California, Berkeley. He received a M.S. in Computer Science from the University of Uppsala, Sweden in 1986. Lotta Frimanson received a degree of Master of Science in Engineering Physics at Uppsala University Sweden in 1989. She has worked at IAR Systems as a product manager since 1999. Prior to this she has 10 years of experience from embedded systems programming. References |
|