By Tim Morin, Director, Strategic Marketing, Microchip Technology Inc
While the term ‘real-time system’ seems intriguing, it simply refers to a system that periodically executes a command on a deterministic basis. Generally, real-time systems are used in controlling machines. So, it is crucial that the system operates deterministically. For example, you cannot have your numerically-controlled drill press taking 10 ms to move from point A to point Bon Tuesday, and 20 ms to perform the same operation on Wednesday. That would be quite disconcerting.Another great example is a flight control system that a pilot used. It is critical that it performs in an extremely predictable fashion each time, each day, irrespective of the external conditions.
So, what’s a deterministic system? We can use the Figure 1 below to illustrate. Here, the time-critical code is handled by the interrupt service routine. The interrupts fire periodically, and the execution time of that code is deterministic. If that does not happen, the result will be as shown in Figure 2, where the hardware updates happen randomly.
Hardware-controlled systems can benefit greatly from the richness that Linux and other associated middleware. A Memory Management Unit (MMU) enables Linux users to virtualize physical memory to the Application Developer. By embedding a MMU into the processors, users can include a L1 cache, or even an L2 cache in most cases. In a sense, we can say that caches and determinism operate orthogonally to each other. Figure 3 below illustrates this. As the figure shows, L1 or L2 misses will stall the execution pipeline by introducing an execution jitter, although the cache lines are filled. We can reduce the frequency of cache misses using larger caches, it not possible to eliminate them completely though.
A branch predictor can also create an execution jitter in some processors that run Linux. Although a Branch predictor helps increaseapplication level performance, there are occasions where branches fail to be predicted, leading to a flushed pipeline. In turn, these misses result in non-deterministic execution behavior.
Branch history tables that are used by predictors during Interrupt Service Routine (ISR) are very central to the execution history of the main application code, with the exception of the ISR’s execution history. Therefore, execution time is bound to vary from ISR to another, chiefly due to the pipeline flushes within the ISR.
One way to counter this is to enable users to disable the branch predictor. This then allows the Application Developer greater control over to decide where and how determinism is applied in the system. Completely disabling the branch predictors is a good way to ensure application wide determinism, but this will come at the cost of sub-optimal performance.
RISC-V PolarFire SoC FPGA Architecture – An introduction
There are different kinds of processors that are available. Some might run Linux but are not designed to execute code deterministically. Or they are capable of executing code deterministically, but cannot run Linux.
Given this, the perfect solution would be an architecture in your embedded toolkit that can support both functionalities. Microchip’s recent announcement of RISC-V basedSoC FPGA architecture for PolarFire SoC addresses this issue.
In Figure 4, we can see four 64-bit RV64GC RISC-V cores that contain a MMU and are capable of running Linux, while the RV64IMAC cannot run Linux since it lacks a MMU. There also differences in instruction sets between the RV64IMAC and the RV64GC. While the RV64GC contains a double precision floating point unit, the RV64IMAC doesn’t.
Users can turn off the branch predictor in any of the core processors, either after power-up or during an ISR to help increase the level of determinism within the architecture. Determinisim can also be increased by choosing in-order instruction pipelines vs out of order instruction pipelines. A side benefit of in order machines is immunity to Spectre and Meltdown attacks.
The PolarFire SoC Memory Subsystem – An Overview
Apart from determinism, the other important aspect is the memory subsystem in PolarFirs SoC, where the actual execution of code takes place. PolarFire SoC has a completely coherent memory space. The coherency manager handles instances of memory that has multiple copies of data.
PolarFire SoC comes with three memory subsystem:L1, L2, and L3. The L3 memory subsystem consists of a hardened LPDDR3/LPDDR4 integrated with DDR3/DDR4 36-bit controller. The extra 4 bits serve to add SECEDED to the external L3 memory subsystem.
L1 Memory Subsystem
Each of the four RV64GC application cores comes with an 8-way set associate,32KB I$TIMs, and 8-way set associate,32KB D$. An instruction cache translates to I$, while TIM stands for Tightly Integrated Memory (TIM). The L1 must always have at least one Cache way.
The RV64IMAC Monitor core has a 16KB two-way set associative I$TIM and an 8KB DTIM. The code is executed by DTIM, which is a data scratchpad memory. All L1 TIM functionality comes with low-latency deterministic access. Also, they are Single Error Correct Double Error Detect (SECDED) capable.
L2 Memory Subsystem
The 2 MB L2 memory subsystem is SECDED capable and has three different configuration modes. It can be configured as a 16-way set associative cache, a Loosely Integrated Memory (LIM) as well as a scratchpad memory. LIM memory can be pinned to a specific processor if needed.. LIMs can be constructed in 128KB chunks (ways).
The L2 memory subsystem can provide deterministic access to the core it is pinned to, when it is configured as a LIM. Also, it is coherent as no other copies are shared with the L1 and L3 memory subsystem. LIM works great for deterministic code execution in both the main application and ISRs. Figure 5 below shows a deterministic system where the L2 memory subsystem is configured as a LIM and the L1s are configured as TIMs.
Unfortunately, ISR execution time can vary despite configuring L2 as LIM, due to mis-prediction by the branch predictors. Figure 6 shows the execution of an application, with L1 and L2 configured as TIM and LIM, respectively. The horizontal axis represents interrupts, while the cycle time within the ISR is indicated by the vertical access. The ISR execution varies overtime. Figure 7 shows the effects of turning off the branch predictors.
Figure 7 demonstrates determinism when the branch predictor is turned off:
Scratchpad memory is allocated out of the cache memory (since its coherent) it too can be configured in 128 KB chunks (ways).Scratchpad memory works well as a shared memory resource that can be used by processor executing code from the LIM as well as processors executing code from the L1/L2 and L3 memory subsystem (typically Linux).
Figure 8 showsa possible configuration of the PolarFire SoC Microprocessor Subsystem. Here, the RV64IMAC takes care of the real-time function (executing from the LIM) , the RV64GCs are responsible for running Linux (using a portion of the L2 as cache), and the Scratchpad pad is used to share messages across cores. In case you require a floating-point performance for your real-time function, the RV64GC can be used after turning off the branch predictors.
While there are several processors available that can either run Linux or execute code deterministically; there are almost none that can handle both. With PolarFire SoC, hard real-time applications and Linux applications can coexist in a flexible, coherent manner by leveraging its unique, flexible memory subsystem. Click on the link to get started with PolarFire SoC.