Measuring Cpu usage with rust Embassy

Jul 14, 2024 Tags: Rust

In the last couple of weeks I started looking into Rust for embedded development, and I really fell in love with Embassy, an async framework for embedded. Thanks to its documentation and examples I was able to get up to speed fairly quickly, even with my very little knowledge of embedded programming and Rust in general.

One thing I felt missing however was a way to measure the cpu usage of my application. Since wat I was planning to do is very cpu intensive I needed a way to tell how close I was to the limit of the processor. In the Embassy documentation there was a link to a blog post where it explained the basics of how to do it, but with a completely different framework.

After some thinkering I was able to port the code to embassy. I created this gitlab repo with some examples so you can get it working in no time. If you don't care about how it works take the code from there, but if you want to know the details continue reading through!

This blog post assumes you are working with an arm 32 bit processor. Different architectures have different support in embassy, for example they lack interrupt executors (at the time of writing). However porting this code to a different architecture shouldn't be too complicated. If you are doing this, or if this blog post becomes outdated thanks to changes is embassy, please let me know so I can update it.

The basics

Let's start with the most basic embassy example, getting a led to blink

#![no_std]
#![no_main]

use embassy_executor::Spawner;
use embassy_rp::gpio;
use embassy_time::{Ticker, Duration};
use gpio::{Level, Output};
use {defmt_rtt as _, panic_probe as _};

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    let p = embassy_rp::init(Default::default());
    let mut led = Output::new(p.PIN_25, Level::Low);
    spawner.spawn(blink(led)).unwrap();
}

#[embassy_executor::task]
pub async fn blink(mut led: gpio::Output<'static>) {
    let mut ticker = Ticker::every(Duration::from_millis(500));
    loop {
        led.toggle();
        ticker.next().await;
    }
}

This seems very simple. However the simplicity comes from the fact that a lot of stuff is hidden from us. We can't access the inner embassy executor, so we have no way of measuring how much time we are actually spending doing work. Looking at the source code of the embassy_executor::main macro, or lookning at the multiprio embassy example, we can see that the previous code is the equivalente to this:

#![no_std]
#![no_main]

use cortex_m_rt::entry;

use embassy_executor::Executor;
use embassy_rp::gpio;
use embassy_time::{Duration, Ticker};
use gpio::{Level, Output};
use static_cell::StaticCell;

use {defmt_rtt as _, panic_probe as _};

static EXECUTOR: StaticCell<Executor> = StaticCell::new();

#[entry]
fn main() -> ! {
    let p = embassy_rp::init(Default::default());
    let mut led = Output::new(p.PIN_25, Level::Low);

    let executor = EXECUTOR.init(Executor::new());
    executor.run(|spawner| {
        spawner.spawn(blink(led)).unwrap();
    });
}
...

We start by creating an executor, using StaticCell to make it static (the executor needs to be static for some reasons). Then, by calling executor.run, we start the async tasks and enter the main loop.

One thing you may notice here is the fact that the main function is no longer async, but is instead a cortex_m entrypoint, and for this reason it can't return any value, it must end up in an infinte loop.

Even though we have scraped the upper level of abstraction, we still have to go deeper to access the main loop. By looking at the source code for Executor we can see that we have to do only some small modifications to get there

...
use embassy_executor::raw::Executor as RawExecutor;
...

static EXECUTOR: StaticCell<RawExecutor> = StaticCell::new();

#[entry]
fn main() -> ! {
    let p = embassy_rp::init(Default::default());
    let mut led = Output::new(p.PIN_25, Level::Low);

    let executor = EXECUTOR.init(RawExecutor::new(usize::MAX as *mut ()));
    let spawner = executor.spawner();
    spawner.spawn(blink(led)).unwrap();
    loop {
        cortex_m::asm::wfe();
        unsafe { executor.poll() };
    }
}
...

Executor is simply a wrapper around RawExecutor. This executor needs to be polled each time something that may wake up one of our takss. To achieve this we need to call wfe, an arm instruction that stands for wait for event, that is exactly what we need.

Measuring sleep time

Now that we control the main loop we can measure the cpu usage. Actually, we are going to measure the opposite, the time spent sleeping, which is simpler. We just need to get the time before calling wfe, the time after the instruction returns, and then we simply compute the difference.

...
use embassy_time::{Duration, Instant, Ticker};
...
fn main() -> ! {
    ...
    let mut sleep_tick_count = 0;
    loop {
        let before = Instant::now().as_ticks();
        cortex_m::asm::wfe();
        let after = Instant::now().as_ticks();
        sleep_tick_count += after - before;
        unsafe { executor.poll() };
    }
}
...

We now have to display this information. Since we probably want to print the cpu usage at regular intervals we should create a periodic task that will access the sleep tick count, then compute the ratio between the difference in the sleep ticks and the difference in total ticks.

However, since we need to share the sleep_tick_count variable between two functions, the main one and the print task, we can't use regular integers, we need something that is safe to share. We need Atomic variables, variables for which data races are impossible. We can use the portable_atomic crate to get atomic integers across architectures, even those that not support atomic operations. portable_atomic::AtomicU64 behaves like an integer, but if we want to increase its value we need to call its fetch_add method, and if we want to retrieve its value the load method. Simple, right?

The final code is this:

...
use portable_atomic::{AtomicU64, Ordering};

static SLEEP_TICKS: AtomicU64 = AtomicU64::new(0);
...
fn main() -> ! {
    ...
    spawner.spawn(cpu_usage()).unwrap();
    loop {
        let before = Instant::now().as_ticks();
        cortex_m::asm::wfe();
        let after = Instant::now().as_ticks();
        SLEEP_TICKS.fetch_add(after - before, Ordering::Relaxed);
        unsafe { executor.poll() };
    }
}

#[embassy_executor::task]
async fn cpu_usage() {
    let mut previous_tick = 0u64;
    let mut previous_sleep_tick = 0u64;
    let mut ticker = Ticker::every(Duration::from_millis(1000));
    loop {
        let current_tick = Instant::now().as_ticks();
        let current_sleep_tick = SLEEP_TICKS.load(Ordering::Relaxed);
        let sleep_tick_difference = (current_sleep_tick - previous_sleep_tick) as f32;
        let tick_difference = (current_tick - previous_tick) as f32;
        let usage = 1f32 - sleep_tick_difference / tick_difference;
        previous_tick = current_tick;
        previous_sleep_tick = current_sleep_tick;
        defmt::info!("Cpu usage: {}%", usage * 100f32);
        ticker.next().await;
    }
}
...

Now everything is complete, and you should see printed the cpu usage of your application printed on your terminal.

Using interrupts

Now imagine that you have a task that is very cpu intensive but not super critical. It may occupy all the cpu cycles of your processor, monopolizing the processor and not letting any other task run, even our cpu_usage task. This is very dangerous in the embedded world. We need a way to have low priority tasks and higher priority tasks, which can't be blocked. Fortunately embassy has a solution, multiple interrupt executors with different priorities!

Setting up interrupt executors is a little more involved than the previous case, but not by much. A simple example that uses interrupt executors is the following:

#![no_std]
#![no_main]

pub use cortex_m_rt::entry;
use embassy_executor::InterruptExecutor;
use embassy_rp::gpio;
use embassy_rp::interrupt;
use embassy_rp::interrupt::{InterruptExt, Priority};
use embassy_time::{Duration, Instant, Ticker};

static EXECUTOR_HIGH: InterruptExecutor = InterruptExecutor::new();
static EXECUTOR_LOW: InterruptExecutor = InterruptExecutor::new();

#[interrupt]
unsafe fn SWI_IRQ_0() {
    EXECUTOR_HIGH.on_interrupt()
}

#[interrupt]
unsafe fn SWI_IRQ_1() {
    EXECUTOR_MED.on_interrupt()
}

#[entry]
fn main() -> ! {
    let p = embassy_rp::init(Default::default());

    let led = gpio::Output::<'static>::new(p.PIN_25, gpio::Level::Low);

    interrupt::SWI_IRQ_0.set_priority(Priority::P1);
    let high_spawner = EXECUTOR_HIGH.start(interrupt::SWI_IRQ_0);
    interrupt::SWI_IRQ_1.set_priority(Priority::P3);
    let low_spawner = EXECUTOR_LOW.start(interrupt::SWI_IRQ_1);

    low_spawner.spawn(blink(led)).unwrap();

    loop {

    }
}

This is very similar to the example in the embassy repository, with one difference: we are not mixing interrupt executors with the regular one. I found it difficult to get the cpu usage working in that case. If you are able to get it working please let me know!

Using interrupt executors we have directly access to the infinite loop. Now we only need a way to sleep and wait for an interrupt, which will wake the corrisponding executor. Like the previous case, we have an arm instruction that serves this exact purpose, wfi (wait for interrupt).

However, before calling this instruction, we need to disable interrupts! That's because if we don't disable them when an interrupt fires it will immediately wake the corrisponding task, and only after it finishes the main code is able to continue, so the time measured will not only be the time spent sleeping, but also the time spent executing the task, which is not what we want. So we need to use an interrupt free context, which is fortunatelly provided by the cortex_m crate. The final code is this:

...
use portable_atomic::{AtomicU64, Ordering};

static SLEEP_TICKS: AtomicU64 = AtomicU64::new(0);
...
fn main() -> ! {
    ...
    spawner.spawn(cpu_usage()).unwrap();
    loop {
        cortex_m::interrupt::free(|_cs| {
            let before = Instant::now().as_ticks();
            cortex_m::asm::wfi();
            let after = Instant::now().as_ticks();
            SLEEP_TICKS.fetch_add(after - before, Ordering::Relaxed);
        });
    }
}

Notice that we don't need to modify the cpu_usage function, it works the same in both cases.

Conclusion

We have seen how to setup an embassy project to mesure the cpu usage, using either regular or interrupt executors. I find it to be a really useful feature that you should include in all your projects. In this repo you can find the complete examples and more!

Measuring Cpu usage with rust Embassy

The basics

Measuring sleep time

Using interrupts

Conclusion

Table of contents