Wednesday, 14 May 2025

Understanding the ARM Cortex-M Exception Model

ARM Cortex-M processors (M0/M0+, M1, M3, M4, M7) use a unified exception mechanism to handle both faults generated by the core itself and external interrupt sources. Whenever something “disturbs” normal program flow—be it a reset, a peripheral interrupt, or an internal fault—the processor switches from Thread mode into Handler mode and vectors to the appropriate handler routine. In this article, we’ll walk through the basics of Cortex-M exceptions, distinguish system exceptions from interrupts, and see how they’re laid out in the vector table.

What Is an Exception?

“Exception” is the umbrella term for any event that forces the processor to halt normal execution and enter a privileged Handler mode. Cortex-M defines two broad categories:

  • System Exceptions are generated internally by the CPU core (e.g. reset, hard fault, SVC).

  • Interrupts originate from on-chip or external peripherals (e.g. timers, GPIO, UART).

In total there are 15 slots reserved for system exceptions (though only nine are implemented on most Cortex-M devices) and up to 240 external interrupt lines, for a combined potential of 255 exception vectors.

The 9 Implemented System Exceptions

Exception NumberNameSource / Trigger
1ResetPower-on or external reset pin
2NMINon-maskable interrupt; cannot be disabled
3HardFaultFault escalated when no other handler applies
4MemManageMPU access violation
5BusFaultErrant bus transaction (e.g. misaligned load/store)
6UsageFaultUndefined instruction, division by zero, etc.
11SVCallSVC (supervisor call) instruction
14PendSVSoftware-triggered interrupt for context switching
15SysTickSystem timer periodic interrupt

Note: Exception numbers 7–10 and 12–13 are reserved for future uses.

Exception Vector Table

At reset, the processor loads the Main Stack Pointer (MSP) from address 0x0000_0000, then reads the reset handler address from 0x0000_0004. Immediately following that in memory are the addresses for NMI, HardFault, and so on, one word per exception slot:


0x0000_0000: Initial MSP value 0x0000_0004: Reset_Handler 0x0000_0008: NMI_Handler 0x0000_000C: HardFault_Handler0x0000_003C: MemManage_Handler 0x0000_0040: BusFault_Handler 0x0000_0044: UsageFault_Handler … (reserved) 0x0000_0058: SVC_Handler 0x0000_0060: PendSV_Handler 0x0000_0064: SysTick_Handler 0x0000_0068: External IRQ0

Peripherals provide their own IRQ lines beginning at exception number 16 (External IRQ0) up to whatever the SoC implements (e.g. up to IRQ81 on many STM32 parts).

Conclusion

The Cortex-M exception model unifies resets, fault conditions, and peripheral interrupts under a single vectored mechanism. Understanding the difference between system exceptions (reset, faults, SVC, PendSV, SysTick) and external interrupts—and how they map into the vector table—is fundamental for writing robust embedded software. In upcoming articles we’ll dive into how to configure priorities, write handlers for faults, and harness PendSV/SVC for RTOS task switching.

Written By: Musaab Taha


This article was improved with the assistance of AI.

Monday, 12 May 2025

Mastering Stack Memory on ARM Cortex-M: Models, Pointers, and Initialization

Stack memory is a region of RAM used for Last-In, First-Out (LIFO) storage of transient data such as function return addresses and local variables. Most CPU architectures, including ARM Cortex-M, provide dedicated PUSH and POP instructions that automatically adjust the stack pointer (SP, register R13) when storing and retrieving data.

Core Uses of Stack Memory

  • Function Calls: The stack holds return addresses and function parameters, enabling nested calls without fixed frame sizes.

  • Local Variables: Space for non-static local variables is allocated on the stack on function entry and reclaimed on exit.

  • Interrupt Contexts: Upon an interrupt or exception, the processor hardware pushes R0–R3, R12, LR, PC, and xPSR onto the stack, preserving thread state and enabling seamless return.

Stack Placement and Memory Layout

RAM in microcontrollers is typically partitioned—via linker scripts—into global data, heap, and stack regions. For example, a 128 KB SRAM might reserve its low addresses for globals, the next block for heap (dynamic allocation), and the high addresses for stack growth. Placing the stack at the top of RAM harnesses the full descending growth model and leaves contiguous space for heap expansion.

Full Descending Stack Model

ARM Cortex-M implements a full descending model: on PUSH, SP is decremented then data stored; on POP, data loaded then SP incremented. SP always points to the most recently pushed item. This model simplifies interrupt entry/exit sequencing and aligns with the ARM Architecture Procedure Call Standard (AAPCS).

Banked Stack Pointers: MSP vs. PSP

Cortex-M cores provide two banked stack pointers:

  • MSP (Main Stack Pointer): Default after reset, used by exception handlers.

  • PSP (Process Stack Pointer): Selectable in thread mode by setting the SPSEL bit in CONTROL
    Switching to PSP lets an RTOS kernel reserve MSP for its own use while giving each task an independent PSP-tracked stack.

Stack Initialization and Overflow Detection

Before main(): The processor fetches the initial SP from the first vector-table entry, setting MSP automatically.
After main(): Applications may reassign SP/PSP to new memory (e.g., external RAM) once initialized.
By monitoring SP against configured limits, software can detect stack overflows, preventing corruption of adjacent memory regions.

Conclusion

Effective stack management in embedded systems hinges on understanding the LIFO storage model, ARM’s full descending operation, and the dual MSP/PSP mechanism. Proper placement—defined by linker scripts—and robust initialization safeguard against overflow and ensure reliable task and interrupt handling in both bare-metal and RTOS environments.

Written By: Musaab Taha


This article was improved with the assistance of AI.

Friday, 9 May 2025

Understanding ARM Cortex-M Memory Map and Bus Interfaces

Embedded systems rely on well-defined address spaces and efficient on-chip communication to deliver performance and reliability. ARM’s Cortex-M processors provide a 32-bit addressable range (0x0000 0000–0xFFFF FFFF) divided into fixed memory regions—code, SRAM, peripherals, external memories, and private-peripheral space—collectively known as the memory map. Complementing this, ARM’s AMBA (Advanced Microcontroller Bus Architecture) defines high-performance (AHB) and low-power (APB) buses for interconnecting the processor, memories, and peripherals. Many vendors also support bit-banding, an optional feature that remaps individual bits in SRAM or peripheral registers to a dedicated alias region for atomic, single-bit operations. In this article, we’ll explore each of these concepts—including the role of bus matrices, AHB→APB bridges, and practical use cases for bit-banding—and show how they underpin robust, high-efficiency microcontroller designs.


The ARM Cortex-M Memory Map

Why a Memory Map?

The Cortex-M’s 32-bit address bus can theoretically access 4 GiB of memory. To manage this, ARM reserves contiguous 512 MiB blocks for different purposes:

  • 0x0000 0000–0x1FFF FFFF: Code region (Flash, ROM, or external code)

  • 0x2000 0000–0x3FFF FFFF: SRAM region

  • 0x4000 0000–0x5FFF FFFF: On-chip peripheral registers

  • 0x6000 0000–0x7FFF FFFF: External SRAM / SDRAM

  • 0x8000 0000–0x9FFF FFFF: External device / shared memory

  • 0xE000 0000–0xE00F FFFF: Private Peripheral Bus (PPB) — NVIC, SysTick, SCB, etc.

Each region has its own size and execution attributes (e.g., peripheral and PPB regions are marked Execute-Never to prevent code injection).


AMBA Bus Interfaces: AHB and APB

High-Performance (AHB-Lite) Buses

Cortex-M cores expose several AHB-Lite masters for simultaneous, high-speed access:

  1. I-Code: Instruction fetch from code memory

  2. D-Code: Data fetch from code region (constants, tables)

  3. System Bus (S-Bus): Access to SRAM and on-chip peripherals (GPIO, timers, ADC, etc.)

  4. Private Peripheral Bus (PPB): Core registers—NVIC, MPU, SysTick, SCB

On many MCUs, multiple AHB buses (AHB1, AHB2, …) feed into an AHB matrix and arbiter, handling requests from DMA, Ethernet, USB, and CPU interfaces.

Peripheral (APB) Buses

Lower-speed peripherals are connected via APB1 and APB2 buses—accessed through an AHB→APB bridge:

  • APB1: Up to 42 MHz (UART, I²C, SPI, CAN, etc.)

  • APB2: Up to 84 MHz (high-speed timer, ADC control, system configuration)

Splitting low-speed and high-speed peripherals across AHB and APB saves power and simplifies clocking.


Bit-Banding: Atomic Single-Bit Access

The Challenge of Bit Manipulation

On a standard byte-addressed bus, setting or clearing one bit in SRAM or a peripheral register requires read–modify–write:

uint8_t val = *addr; // LDRB val &= ~(1 << bit); // modify *addr = val; // STRB

This three-step sequence can be disrupted by interrupts or require locking for thread safety.

The Bit-Banding Solution

ARM optional bit-banding remaps each bit in a 1 MiB region of SRAM or peripherals to a unique 32-bit alias word:

  • SRAM bit-band region: 0x2000 0000–0x200F FFFF

  • SRAM alias region: 0x2200 0000–0x23FF FFFF

  • Periph bit-band region: 0x4000 0000–0x400F FFFF

  • Periph alias region: 0x4200 0000–0x43FF FFFF

To clear bit n of address A, compute:

alias = alias_base + 32*(A - region_base) + 4*n; *(volatile uint32_t*)alias = 0; // atomic bit clear

This single write updates only that bit—atomically and without interrupts.


Conclusion

ARM’s memory map, AMBA bus interfaces, and bit-banding together create a powerful, flexible foundation for Cortex-M microcontrollers. The fixed 512 MiB regions simplify peripheral and memory placement, while AHB/APB separation optimizes performance and power. Bit-banding adds a hardware-accelerated, atomic bit-manipulation capability invaluable for lock-free concurrency and efficient GPIO or flag management. Understanding these features helps embedded developers design memory-efficient, high-throughput, and reliable firmware on Cortex-M platforms.

Written By: Musaab Taha


This article was improved with the assistance of AI.

Tuesday, 6 May 2025

Switching Privilege Levels on ARM Cortex-M: Using the CONTROL Register for Secure Thread Mode

In this article, we’ll walk through how the ARM Cortex-M architecture enforces two privilege levels in Thread mode, demonstrate dropping Thread mode to unprivileged via inline assembly (MRS/MSR on the CONTROL register), trigger and observe the resulting UsageFault when unprivileged code attempts protected accesses, and explain how to regain privilege only via exception (Handler) mode. Finally, we’ll discuss why this mechanism underpins RTOS task isolation and security.

Thread vs. Handler Mode Privilege Levels

ARM Cortex-M cores distinguish between two execution contexts:

  • Thread Mode: Where application or RTOS tasks run; can be Privileged or Unprivileged based on CONTROL.nPRIV (bit 0). 

  • Handler Mode: Always Privileged, entered on exception or interrupt.

By default after reset, Thread mode is Privileged. Only Handler mode may clear the nPRIV bit to restore Thread mode’s Privileged level.

Reading and Writing the CONTROL Register

The CONTROL register’s nPRIV bit controls Thread-mode privilege:

  • Reading: MRS R0, CONTROL moves CONTROL into R0.

  • Modifying: Perform a read-modify-write sequence:

    __asm volatile ( "MRS R0, CONTROL\n" "ORR R0, R0, #1 \n" /* set nPRIV */ "MSR CONTROL, R0 \n" :::"r0" );

    MSR CONTROL, R0 preserves Handler-mode privilege requirement—unprivileged Thread code cannot write CONTROL directly.

Demonstration: Dropping to Unprivileged Thread Mode

  1. Start in Thread mode, Privileged (CONTROL = 0).

  2. Call drop_to_unprivileged() implementing the MRS/ORR/MSR sequence above.

  3. CONTROL becomes 1; Thread mode is now Unprivileged.

  4. Attempt to write an NVIC register (only Privileged):

    NVIC->ISPR[0] |= (1U << irq);
  5. Result: Cortex-M triggers a UsageFault because Unprivileged code cannot modify system registers.

Observing the UsageFault

On hardware or in the debugger:

  • Execution jumps to the HardFault handler after the UsageFault escalates.

  • Fault analyzer shows a UsageFault origin—indicating restricted access by Unprivileged Thread code.

Regaining Privilege: Exception Entry Path

Unprivileged Thread code cannot directly set CONTROL.nPRIV back to 0. The only path is:

  1. Generate an exception (e.g., via SVCall or another IRQ).

  2. On entry, core switches to Handler mode (always Privileged).

  3. In Handler, use MSR CONTROL, <value> to clear nPRIV.

  4. Return from exception, dropping back to Thread mode now Privileged.

RTOS Task Isolation Use Case

Real-time kernels (e.g., FreeRTOS) leverage this hardware feature to sandbox user tasks:

  • Kernel runs at Privileged Thread mode to manage resources.

  • User tasks are launched as Unprivileged Thread mode, preventing direct access to critical control registers or disabling interrupts.

  • System calls (SVCall) elevate to Handler mode, letting the kernel on behalf of the user task perform privileged operations.

Conclusion

ARM Cortex-M’s dual privilege levels in Thread mode and mandatory Handler-mode privilege provide a lightweight, hardware-enforced sandbox. By mastering MRS/MSR access to the CONTROL register and the exception-based path back to Privileged mode, embedded developers can build robust, secure firmware and RTOS task isolation. This mechanism is fundamental for preventing errant or malicious user code from compromising system stability or security.

Written By: Musaab Taha


This article was improved with the assistance of AI.

Friday, 2 May 2025

Understanding the ARM Cortex-M Reset Sequence

Upon any hardware or power-on reset, an ARM Cortex-M processor follows a precise sequence to bring the system into a known, stable state before executing application code. First, the Main Stack Pointer (MSP) is initialized from the value at address 0x0000 0000, ensuring the stack is correctly set up. Next, the Program Counter (PC) is loaded with the reset handler’s address from 0x0000 0004, transferring control to the startup code. The reset handler—typically provided by the MCU’s startup file—performs critical early initialization: setting up the data and BSS sections, configuring clocks, and initializing the C runtime environment before calling main(). This flow guarantees that by the time user code executes, all core hardware and memory regions are correctly configured.

The Three Key Steps of Reset

  1. MSP Initialization

    • On reset, the processor reads the 32-bit value at address 0x0000 0000 into the Main Stack Pointer (MSP), establishing the stack base for exception handling and function call returns.

  2. PC Loading

    • It then reads the next 32-bit word at 0x0000 0004—pointing to the address of the reset handler—into the PC, causing the CPU to branch there.

  3. Executing the Reset Handler

    • The reset handler (typically in the startup assembly file) runs early initialization routines—relocating and zero-clearing memory sections, setting up system clocks, and initializing the C library—before finally calling the main() function of the application.

Why the Reset Sequence Matters

  • Memory Safety: MSP setup prevents stack corruption by pointing to valid SRAM.

  • Predictable Startup: By fetching the reset handler from a fixed vector table, the system always begins execution at a known location.

  • C Runtime Readiness: Automatic data/BSS initialization and standard library setup ensure that global variables and library functions behave correctly once main() runs.

Where It Lives: The Vector Table and Startup File

  • The vector table resides at the start of flash (0x0000 0000) and holds the initial MSP and PC values.

  • The startup file (often named startup_stm32f4xx.s or similar) defines the reset handler and exception vectors. It implements low-level assembly for early setup and invokes higher-level C routines.

Summary

Understanding the Cortex-M reset sequence—from MSP load to reset handler execution—is fundamental for embedded developers. It underpins reliable system boot, correct memory initialization, and seamless transition into application code.

Written By: Musaab Taha


This article was improved with the assistance of AI.