Do you know your Arm cores?

By Mark Patrick, Mouser Electronics

It is no exaggeration to say that the Arm architecture has helped shape the technology landscape since it was introduced in the 1980s. In particular, it has largely redefined the embedded sector in the last three decades, from 8-bit to 32-bit processing. But it is important to appreciate that the Arm architecture applies to more than low power IoT devices. Thanks to the company’s strategy, there are processor cores suitable for almost every application across all sectors.

In those early days it was one of the few 32-bit architectures targeting deeply embedded applications but today it is commonplace. The number of licensees for the Arm architecture continues to increase and includes almost all semiconductor device manufacturers and even more OEMs. This market penetration means there is a virtually innumerable number of Arm-based processors, microcontrollers, ASICs, and ASSPs in use today. While they are each unique in their way, they are fundamentally similar in their instruction set architecture. Since 2011, the current version of the Arm architecture, Version 8 (or, more simply, Arm v8) offers 64-bit performance with backward-compatibility with its 32-bit heritage.

Application Processors for smarter cellular communications

The Cortex-A family, which appeared in 2005, found fame by being largely responsible for the shift from simple mobile phones to smartphones that now dominate the sector. But its performance was not limited to more capable portable devices. The features and benefits of the Cortex-A meant it was rapidly discovered and adopted by the developers of servers and other high-performance computing systems.

A key reason for its almost instant success was the inclusion of a memory management unit (MMU) which could support a paged memory structure. This was important, as it supports virtual address spaces, providing greater robustness in a multitask environment. An MMU is also a requirement for running Linux and any operating system based on Linux. However, one of the penalties of using virtually paged memory is the performance impact it can have on real-time applications, which means it is less common in devices intended for hard real-time applications.

Another important addition to the Cortex-A architecture was Arm’s TrustZone technology. This is a security feature implemented at the hardware level to give a hypervisor greater control over the processor’s features and memory. This means that any task that hasn’t been granted the necessary privileges can be restricted from accessing them. The TrustZone technology is also used to protect the processor’s security features, such as cryptography, by running them in a virtual processor that is placed behind a firewall implemented in hardware.

In 2011, Arm introduced another significant development; the concept of the big. Little architecture. This is a multicore approach that marries multiple Cortex-A cores to offer high performance when needed, without sacrificing low power operation when performance can be reduced. This concept has become more viable with the introduction of higher-performing cores such as the Cortex-A72. This is a superscalar processor that can issue three instructions at the same time while still providing out-of-order execution. It is now common to see a highly integrated solution feature as many as four high-end Cortex-A cores alongside a lower-end core, such as the Cortex-A5 or -A7, in a big. Little configuration to provide high performance and low power in a single package.

Real-Time Performance from an Arm core

Perhaps the least known family of Arm cores is the Cortex-R. This is a testament to the success of the Cortex-A and Cortex-M, rather than a reflection of the features and benefits offered by the Cortex-R. Indeed, this branch of the Arm core tree has seen great success in its target applications, which are predominantly automotive and industrial control, as well as cyber-physical systems.

As these applications may suggest, the Cortex-R has focused on combining real-time performance and high-reliability, which it does through its high level of deterministic execution. In simpler terms, many processor architectures use a cache, or short-term local storage, to store instructions and/or data that the core ‘predicts’ will be used regularly. By storing these in the cache memory, the overhead of fetching and decoding instructions from the main memory can be avoided, which speeds up execution. However, in real-time applications, the penalty of a cache miss, or not having the instruction ready, has the effect of reducing deterministic execution. The Cortex-R family overcomes this by instead using a tightly coupled memory (TCM), which is a more deterministic approach that better supports real-time execution.

The first Cortex-R core was the R4, which has seen a great uptake in hard real-time applications. Since then, Arm has extended the family to include the R5 and R7, which added low-latency peripherals and ports. The ports are linked to peripherals through a high-performance, low latency bus; often, licensees will choose a bus optimized for the core and peripherals, such as the AHB (Arm Hardware Bus) or AXI (Advanced eXtensible Interface).

Other important features related to the target applications include error correction coding and the ability to operate in lock-step. This means two processors can run side by side, executing the same code at the same time. If the main processor experiences an error, which is identified by the on-chip monitor, execution can switch to the secondary processor without missing a beat. The Traveo S6J33xx family of MCUs based on the Cortex-R5, from Cypress Semiconductor, is a good example of how this powerful core and its deterministic nature is being used. This device integrates the R5 core with a 2D graphic engine with CAN FD support, making it ideal for automotive applications.

Arm Microcontrollers continue to define the IoT

Today we are familiar with the Cortex segmentation that was introduced in 2004 with the Cortex-M group. This family of cores was targeted squarely at the microcontroller end of the market (MCUs). Initially, Arm chose to use the v7 architecture, but as Arm moved in to more deeply embedded, low power applications it enlisted the v6 architecture to develop the Arm-CortexM0, M0+, and M1 cores. Like all Cortex-M cores, these were also able to execute the Thumb instruction set (16-bit), but later examples were also able to run the full A32 instruction set, too.

Figure 1: Silicon Labs’ EFM Tiny Gecko.

Perhaps the Cortex-M core most widely adopted is the M3; many MCU manufacturers have an M3 license and use it as the basis for their 32-bit MCU families. This core is used for a wide variety of devices, from highly integrated systems-on-chip (SoC) such as the PSoC5 SoC from Cypress Semiconductor to the more power-conscious Tiny Gecko from Silicon Labs. The Cortex-M family has grown in both directions since the M3, with the more capable Cortex-M4 and -M4F, which adds floating-point DSP instructions, and the aforementioned Cortex-M0 and Cortex-M0+.

Their low power credentials and small transistor count (and therefore small silicon area), has encouraged some manufacturers to use multiple cores to create heterogeneous multicore solutions, such as the LPC5411x family from NXP, which combines a Cortex-M0+ core with a Cortex-M4 with a floating-point on the same device. Similarly, the PSoC6 from Cypress Semiconductor also integrates the Cortex-M4F with the Cortex-M0+. The advantage of integrating two cores in a single device is that the application can use the lower power M0+ for housekeeping tasks and simple control routines, but quickly invoke the performance of the M4 when the application needs to execute more intensive or time-sensitive processing.

Figure 2: The PSoC6 from Cypress.

When Arm introduced the Cortex-M7 core in 2014, the performance was increased even further. With a six-stage superscalar pipeline and support for out-of-order completion, the M7 more closely resembles the higher-end processor cores found in applications such as smartphones, tablets, and laptops. One of the most successful MCUs based on the M7 core is STMicroelectronics’ STM32F730x8, which also includes the company’s ART technology; this allows the device to execute from Flash memory without wait-states, providing engineers with even more flexibility.

With the introduction of the Arm v8 architecture, Arm was able to further improve the Cortex-M product offering, albeit without the support of 64-bit instructions. What it did do, however, was upgrade the memory protection unit (MPU) and add support for execute-only memory. The latter feature actively protects against the reverse-engineering of software, but perhaps most significantly Arm v8 for Cortex-M brought the features of TrustZone to the deeply embedded application space.

Using TrustZone makes it safe to switch between secure and non-secure states without the need for a hypervisor. It uses dedicated instructions and a privileged mode to move data between tasks using secure registers. Access to these registers is restricted to other tasks, including interrupts, which makes it easier for developers to create IoT devices that are highly secure.

A good example of an MCU based on this version of the core architecture is the SAML11 from Microchip. Based on the Cotex-M23 core, this family adds security features such as a factory-provisioned root of the trust key. The Cortex-M33 core is at the heart of Nordic Semiconductors’ nRF9160 series, which integrates the MCU core alongside an RF frontend and baseband as a System in Package (SIP) for LTE connectivity using CAT-M or NB-IoT.


Figure 3: An example of a SAML11 MCU from Microchip.

Unlike most other examples of a single instruction set architecture, the Arm portfolio has successfully penetrated and, in the opinion of many, dominated the technology landscape. The way it influenced the embedded sector during the Smart Revolution cannot be underestimated, but its future seems just as bright.

The Arm architecture comprises three distinct families; the Cortex-A, Cortex-R, and Cortex-M, which continue to target the three key parameters of performance, determinism, and power. This strategy ensures it maintains its heritage and retains its approach to scalability. These features are likely to become more important in the future, as the requirements of specific applications begin to merge and combine.

Share this post