Practical Guide to Bare Metal C++

2026-03-074:2213159arobenko.github.io

Prior to describing various embedded (bare metal) development concepts I’d like to cover several basic needs that, I think, most developers will have to use in their products. One of the basic needs…

Prior to describing various embedded (bare metal) development concepts I’d like to cover several basic needs that, I think, most developers will have to use in their products.

One of the basic needs during the development is having an ability to test various assumptions and invariants in runtime when compiling the application in DEBUG mode and remove the checks when compiling the application in RELEASE mode. The standard C++ reuses assert() macro from standard C library.

#include <cassert>

assert(some_condition);

The assert() macro evaluates to nothing in case NDEBUG symbol is defined, otherwise it evaluates the condition. If the condition doesn’t return true, it calls the __assert_fail function, provided by standard library, which in turn calls printf to print error message to standard output followed by the call to abort function, which is supposed to terminate an application.

Both printf and abort functions are provided by standard library. However, printf will require the implementation of _write function to print characters to the debug output terminal, and abort will require implementation of _exit function to terminate the application.

If standard library is excluded from the compilation (using -nostdlib compilation option), the compilation will fail with undefined reference to __assert_func error message. The developer will have to implement this function with correct signature. To retrieve the correct signature you will have to open assert.h standard header provided by your compiler. It will be something like this:

void __assert_fail (const char *expr, const char *file, unsigned int line, const char *function) __attribute__ ((__noreturn__));

The attribute specifies that this function doesn’t return, so the compiler will generate a call to it without setting any address to return to.

The conclusion from all the stated above is that using standard assert() macro is possible, but somewhat inflexible. It is possible to access only global variables from the functions described above, i.e. if there is a need to flash a led to indicate assertion failure, then its control must be accessible through global variables, which is a bit ugly. Another disadvantage of this approach is that there are no convenient means to change the behaviour of the assert failure functionality and after a while restore the original behaviour. Such behaviour may be helpful to better identify the location of the assert that has failed. For example, override the default assert failure behaviour with activating a specific led at the entrance of some function, and restore the original assertion failure behaviour when function returns.

Below is a short description of a better way to handle assert checks and failures. The code is in embxx library and can be reviewed here.

To resolve the problems described above and to handle the assertions C++ way we will have to create generic assertion failure handling abstract class:

class Assert
{
public: virtual void fail( const char* expr, const char* file, unsigned int line, const char* function) = 0;
};

When implementing custom project specific assertion failure behaviour inherit from the class above:

#include "embxx/util/Assert.h"

typedef ... Led;
class LedOnAssert : public embxx::util::Assert
{
public: LedOnAssert(Led& led) : led_(led) { } virtual void fail( const char* expr, const char* file, unsigned int line, const char* function) { led_.on(); while (true) {;} } private: Led& led_;
};

To manage an object of the class above, we will have to create a singleton class with static instance. It will store a pointer to the currently registered assertion failure behaviour:

class AssertManager
{
public: static AssertManager& instance() { static AssertManager mgr; return mgr; } Assert* reset(Assert* newAssert = nullptr) { auto prevAssert = assert_; assert_ = newAssert; return prevAssert; } Assert* getAssert() { return assert_; } bool hasAssertRegistered() const { return assert_ != nullptr; } void infiniteLoop() { while (true) {}; } private: AssertManager() : assert_(nullptr) {} Assert* assert_;
};

The reset member function registers new object that manages assertion failure behaviour and returns previous one, which can be used later to restore original behaviour.

We will require a new macro to check assertion condition and invoke registered failing behaviour:

#ifndef NDEBUG

#define GASSERT(expr) \ ((expr) \ ? static_cast<void>(0) \ : (embxx::util::AssertManager::instance().hasAssertRegistered() \ ? embxx::util::AssertManager::instance().getAssert()->fail( \ #expr, __FILE__, __LINE__, GASSERT_FUNCTION_STR) \ : embxx::util::AssertManager::instance().infiniteLoop()))

#else // #ifndef NDEBUG

#define GASSERT(expr) static_cast<void>(0)

#endif // #ifndef NDEBUG

Then in case of condition check failure, the GASSERT() macro checks whether any custom assertion failure functionality registered and invokes its virtual fail function. If not, then infinite loop is executed.

To complete the whole picture we have to provide a convenient way to register new assertion failure behaviours:

template < typename TAssert>
class EnableAssert
{ static_assert(std::is_base_of<Assert, TAssert>::value, "TAssert class must be derived class of Assert");
public: typedef TAssert AssertType; template<typename... Params> EnableAssert(Params&&... args) : assert_(std::forward<Params>(args)...), prevAssert_(AssertManager::instance().reset(&assert_)) { } ~EnableAssert() { AssertManager::instance().reset(prevAssert_); } private: AssertType assert_; Assert* prevAssert_;
};

From now on, all we have do is to instantiate object of EnableAssert with the behaviour that we want. Note that constructor of EnableAssert class can receive any number of parameters and forwards them to the constructor of the internal assert_ object.

int main (int argc, const char* argv[])
{ ... Led led; embxx::util::EnableAssert<LedOnAssert> assertion(led); ... // Rest of the code
}

If there is a need to temporarily override the previous assertion failure behaviour, just create another EnableAssert object. Once the latter is out of scope (the object is destructed), previous behaviour will be restored.

int main (int argc, const char* argv[])
{ ... Led led; embxx::util::EnableAssert<LedOnAssert> assertion(led); ... { embxx::util::EnableAssert<OtherAssert> otherAssertion(.../* some params */); ... } // restore previous registered behaviour – LedOnAssert.
}

SUMMARY: The approach described above provides a flexible and convenient way to control how the failures of various debug mode checks are reported to the developer. All the modules in embxx library use the GASSERT() macro to verify their pre- and post-conditions as well as internal assumptions.

As has been mentioned in the Benefits of C++ chapter, the main reason for choosing C++ over C is code reuse. When having some generic piece of code that tries to use platform specific code and needs to receive some kind of notifications from the latter, the need for some generic callback facility arises. C++ provides std::function class for this purpose, it is possible to provide any callable object, such as lambda function or std::bind expression:

class LowLevelPeripheral {
public: template <typename TFunc> void setEventCallback(TFunc&& func) { eventCallback_ = std::forward<TFunc>(func); } void eventHandler() { if (eventCallback_) { eventCallback_(); // invoke registered callback object } }
private: std::function<void ()> eventCallback_;
}; class SomeGenericControl
{
public: SomeGenericControl() { periph_.setEventCallback( std::bind(&SomeGenericControl::eventCallbackHandler, this)); } void eventCallbackHandler() {  // Handle the reported event. } private: LowLevelPeripheral periph_;
};

There are two problems with using std::function. It uses dynamic memory allocation and throws exception in case the function is invoked without assigning callable object to it first. As a result std::function may be not suitable for use in most of the bare metal projects. We will have to implement something similar, but without dynamic memory allocations and without exceptions. Below is some short explanation of how to implement such a function class. The implementation of the StaticFunction class is part of embxx library and its full code listing can be viewed here.

The restriction of inability to use dynamic memory allocation requires to use additional parameter of storage size:

template <typename TSignature, std::size_t TSize = sizeof(void*) * 3>
class StaticFunction;

It seems that in most cases the callback object will contain pointer to member function, pointer to handling object and some additional single parameter. This is the reason for specifying the default storage space as equal to the size of 3 pointers. The “signature” template parameter is exactly the same as with std::function plus an optional storage area size template parameter:

typedef embxx::util::StaticFunction<void (int)> MyCallback;
typedef embxx::util::StaticFunction< void (int, int), sizeof(void*) * 4> MyOtherCallback;

To properly implement operator(), there is a need to split the signature into the return type and rest of parameters. To achieve this the following template specialisation trick is used:

template <std::size_t TSize, typename TRet, typename... TArgs>
class StaticFunction<TRet (TArgs...), TSize>
{
public: ... TRet operator()(TArgs... args) const {...} ...
private: typedef  StorageType; // Type of the storage area, // will be explained later. StorageType handler_; // Storage area where the callback object // is stored bool valid_; // flag indicating whether storage are contains // valid callback, initialised to false in // default constructor
};

The StaticFunction object needs an ability to store any type of callable object as its internal data member and then invoke it in its operator() member function. To support this functionality we will require additional helper classes:

class StaticFunction<TRet (TArgs...), TSize>
{ ...
private: class Invoker { public: virtual ~Invoker() {} // virtual invocation function virtual TRet exec(TArgs... args) const = 0; }; template <typename TBound> class InvokerBound : public Invoker { public: template <typename TFunc> InvokerBound(TFunc&& func) : func_(std::forward<TFunc>(func)) { } virtual ~InvokerBound() {} virtual TRet exec(TArgs... args) const { return func_(std::forward<TArgs>(args)...); } private: TBound func_; }; ...
};

The callable object that will be stored in handler_ data area and it will be of type InvokerBound<…​> while invoked through interface of its base class Invoker.

There is a need to properly define StorageType for the handler_ data member:

static const std::size_t StorageAreaSize = TSize + sizeof(Invoker);
typedef typename std::aligned_storage< StorageAreaSize, std::alignment_of<Invoker>::value >::type StorageType;

Note that StorageType is an uninitialised storage with alignment required to be able to store object of type Invoker. The InvokerBound<…​> class will have the same alignment requirements as its base class Invoker, so it is safe to store any object of type InvokerBound<…​> in the same area, as long as its size doesn’t exceed the size of the StorageType.

Also note that the actual size of the storage area is the requested TSize plus the area required to store the object of Invoker class. The size of InvokerBound<…​> object is size of its private member plus the size of its base class Invoker, which will contain a single (hidden) pointer to its virtual table.

Any callable object may be assigned to StaticFunction using either constructor or assignment operator:

template <std::size_t TSize, typename TRet, typename... TArgs>
class StaticFunction<TRet (TArgs...), TSize>
{
public: ... template <typename TFunc> StaticFunction(TFunc&& func) : valid_(true) { assignHandler(std::forward<TFunc>(func)); } StaticFunction& operator=(TFunc&& func) { destroyHandler(); assignHandler(std::forward<TFunc>(func)); valid_ = true; return *this; } ... private: template <typename TFunc> void assignHandler(TFunc&& func) { typedef typename std::decay<TFunc>::type DecayedFuncType; typedef InvokerBound<DecayedFuncType> InvokerBoundType; static_assert(sizeof(InvokerBoundType) <= StorageAreaSize, "Increase the TSize template argument of the StaticFucntion"); static_assert(alignof(Invoker) == alignof(InvokerBoundType), "Alignment requirement for Invoker object must be the same " "as alignment requirement for InvokerBoundType type object"); new (&handler_) InvokerBoundType(std::forward<TFunc>(func)); } void destroyHandler() { if (valid_) { auto invoker = reinterpret_cast<Invoker*>(&handler_); invoker->~Invoker(); } }
};

Please pay attention that assignment operator has to call the destructor of previous function, that was assigned to it, before storing a new callable object in its place.

Also note that there are compile time checks using static_assert that the size of the object to store in the storage area doesn’t exceed the allocated size as well as alignment requirements still hold.

The invocation of the function will be implemented like this:

template <std::size_t TSize, typename TRet, typename... TArgs>
class StaticFunction<TRet (TArgs...), TSize>
{
public: ... TRet operator()(TArgs... args) const { GASSERT(valid_); auto invoker = reinterpret_cast<Invoker*>(&handler_); return invoker->exec(std::forward<TArgs>(args)...); } ...
};

Note that there are no exceptions in use and then the “must have” pre-condition for function invocation is that a valid callable object has been assigned to it. That is the reason for assertion check in the body of the function.

To complete the implementation of StaticFunction class the following logic must also be implemented:

  1. Check whether the StaticFunction object is valid, i.e has any callable object assigned to it.

  2. Default construction - the function is invalid and cannot be invoked.

  3. Copy/move construction + copy/move assignment functionality.

  4. Clearing the function (invalidating).

  5. Supporting both const and non-const operator() in the assigned callable object. It requires both const and non-const operator() implementation of StaticFunction as well as its internal Invoker and InvokerBound<…​> classes.

All this I leave as an exercise to to the reader. To see the complete implementation of the functionality described above open this link.

Another essential need in embedded development is an ability to serialise data. Most embedded products read data from some kind of sensors and/or communicate with the control centre via some wired or wireless serial interface.

Before data is sent via a communication link, it must be serialised into a buffer, and when received, deserialised from bytes also in a different buffer on the other end. The data may be serialised using big or little endian, based on the communication protocol used. The embxx library provides a generic code with an ability to read and write integral values from/to any buffer. Here is the source code for the functions described below.

The functions below (defined in namespace embxx::io) support read and write of an integral value using any type of iterator:

template <typename T, typename TIter>
void writeBig(T value, TIter& iter); template <typename T, typename TIter>
T readBig(TIter& iter); template <typename T, typename TIter>
void writeLittle(T value, TIter& iter); template <typename T, typename TIter>
T readLittle(TIter& iter);

These functions receive reference to iterator of a buffer/container. When bytes are read/written from/to the buffer, the iterator is incremented. The iterator can be of any type as long as it supports dereferencing (operator*()), pre-increment (operator++) and assignment to dereferenced object. For example, serialising several values of various lengths into the array using big endian:

std::uint8_t buf[128];
auto iter = &buf[0]; std::uint16_t value1 = 0x0102;
std::uint32_t value2 = 0x03040506;
std::uint64_t value3 = 0x0708090a0b0c0d0e; embxx::io::writeBig(value1, iter);
embxx::io::writeBig(value2, iter);
embxx::io::writeBig(value3, iter);

The contents of the buffer will be: {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c 0x0d, 0x0e, …}

Similar code of reading values from the buffer would be:

std::uint8_t buf[128];
auto iter = &buf[0]; auto value1 = embxx::io::readBig<std::uint16_t>(iter);
auto value2 = embxx::io::readBig<std::uint32_t>(iter);
auto value3 = embxx::io::readBig<std::uint64_t>(iter);

Another example is serialising data into a container that has push_back() member functions, such as std::vector or circular buffer. The data will be added at the end of the existing one:

std::vector<std::uint8_t> buf;
auto iter = std::back_inserter(buf); // Will call push_back // on assignment

// The writes below will use push_back for every byte.
embxx::io::writeBig(value1, iter);
embxx::io::writeBig(value2, iter);
embxx::io::writeBig(value3, iter);

Depending on a communication protocol there may be a need to serialise only part of the value. For example some field of communication protocol is defined having only 3 bytes. In this case the value will probably be stored in a variable of std::uint32_t type. There is similar set of functions, but with additional template parameter that specifies how many bytes to read/write:

template <std::size_t TSize, typename T, typename TIter>
void writeBig(T value, TIter& iter); template <typename T, std::size_t TSize, typename TIter>
T readBig(TIter& iter); template <std::size_t TSize, typename T, typename TIter>
void writeLittle(T value, TIter& iter); template <typename T, std::size_t TSize, typename TIter>
T readLittle(TIter& iter);

So to read/write 3 bytes will look like the following:

auto value = embxx::io::readBig<std::uint32_t, 3>(iter);
embxx::io::writeBig<3>(value, iter);

Sometimes the endianness of data serialisation may depend on some traits class parameters. In order to be able to choose “Little” or “Big” variant functions at compile time instead of runtime the tag parameter dispatch idiom must be used.

There are similar read/write functions, but instead of being differentiated by name they have additional tag parameter to specify the endianness of serialisation:

/// Same as writeBig<T, TIter>(value, iter);
template <typename T, typename TIter>
void writeData( T value, TIter& iter, const traits::endian::Big& endian); /// Same as writeBig<TSize, T, TIter>(value, iter)
template <std::size_t TSize, typename T, typename TIter>
void writeData( T value, TIter& iter, const traits::endian::Big& endian); /// Same as writeLittle<T, TIter>(value, iter)
template <typename T, typename TIter>
void writeData( T value, TIter& iter, const traits::endian::Little& endian); /// Same as writeLittle<TSize, T, TIter>(value, iter)
template <std::size_t TSize, typename T, typename TIter>
void writeData( T value, TIter& iter, const traits::endian::Little& endian); /// Same as readBig<T, TIter>(iter)
template <typename T, typename TIter>
T readData(TIter& iter, const traits::endian::Big& endian); /// Same as readBig<TSize, T, TIter>(iter)
template <typename T, std::size_t TSize, typename TIter>
T readData(TIter& iter, const traits::endian::Big& endian); /// Same as readLittle<T, TIter>(iter)
template <typename T, typename TIter>
T readData(TIter& iter, const traits::endian::Little& endian); /// Same as readLittle<TSize, T, TIter>(iter)
template <typename T, std::size_t TSize, typename TIter>
T readData(TIter& iter, const traits::endian::Little& endian);

The traits::endian::Big and traits::endian::Little are defined as empty tag classes:

namespace traits
{ namespace endian
{ struct Big {}; struct Little {}; } // namespace endian } // namespace traits
template <typename TTraits>
class SomeClass
{
public: typedef typename TTraits::Endianness Endianness; template <typename TIter> void serialise(TIter& iter) const { embxx::io::writeData(data_, iter, Endianness()); } private: std::uint32_t data_;
};

So the code above is not aware what endianness is used to serialise the data. It is provided as internal type of Traits class named Endianness. The compiler will generate the call to appropriate writeData() function, which in turn forward it to writeBig() or writeLittle().

To serialise data using big endian the traits should be defined as following:

struct MyTraits
{ typedef embxx::io::traits::endian::Big Endianness;
}; SomeClass<MyTraits> someClassObj;

someClassObj.serialise(iter); // Will serialise using big endian

The interface described above is very easy and convenient to use and quite easy to implement using straightforward approach. However, any variation of template parameters create an instantiation of new binary code which may create significant code bloat if not used carefully. Consider the following:

  • Read/write of signed vs unsigned integer values. The serialisation/deserialisation code is identical for both cases, but won’t be considered as such when instantiating the functions. To optimise this case, there is a need to implement read/write operations only for unsigned value, while the “signed” functions become wrappers around the former. Don’t forget a sign extension operation when retrieving partial signed value.

  • The read/write operations are more or less the same for any length of the values, i.e of any types: (unsigned) char, (unsigned) short, (unsigned) int, etc…​ To optimise this case, there is a need for internal function that receives length of serialised value as a run time parameter, while the functions described above are mere wrappers around it.

  • Usage of the iterators also require caution. For example reading values may be performed using regular iterator as well as const_iterator, i.e. iterator pointing to const values. These are two different iterator types that will duplicate the “read” functionality if both of them are used:

char buf[128] = {};
const char* iter1 = &buf[0];
char* iter2 = &buf[0]; // Instantiation 1
auto value1 = embxx::io::readBig<std::uint16_t>(iter1); // Instantiation 2
auto value2 = embxx::io::readBig<std::uint16_t>(iter2);

It is possible to optimise the case above for random access iterator by using temporary pointers to unsigned characters to read the required value. After retrieval is complete, just increment the value of the passed iterator with number of characters read.

All the consideration points stated above require quite complex implementation of the serialisation/deserialisation functionality with multiple levels of abstraction which is beyond the scope of this book. It would be a nice exercise to try and implement it yourself. Another option is to use the code as is from embxx library.

There is almost always a need to have some kind of a queuing functionality. A circular buffer is a good compromise between speed of execution and memory consumption (vs std::deque for example). If your product allows usage of dynamic memory allocation and/or exceptions than boost::circular_buffer can be a good choice. However, if using dynamic memory allocation is not an option, then there is no other choice but to implement a circular buffer with maximum length known at compile time over C array or std::array. Here is the implementation of StaticQueue functionality from embxx library. I won’t go into too much details or explain every line of code. Instead I will emphasise several important points that must be taken into consideration.

There can always be an attempt to perform an invalid operation, such as access an element outside the queue boundaries, or inserting new element when the queue is full, or popping an element when queue is empty, etc…​ The conventional way in C++ to handle these cases is to throw an exception. However, in embedded and especially in bare metal programming it’s not an option. The right way to handle these errors would be asserting on pre-conditions. The StaticQueue implementation in embxx library uses GASSERT() macro described earlier. The checks will be compiled only in non-Release mode (NDEBUG not defined) and in case of the failure it will invoke the project specific code the developer has written to report assertion failure.

template <typename T, std::size_t TSize>
class StaticQueue
{
public: ... void popFront() { GASSERT(!empty()); ... }
};

When the queue is created it doesn’t contain any elements. However, it must contain uninitialised space where elements can be created in the future. The space must be of sufficient size and be properly aligned.

template <typename T, std::size_t TSize>
class StaticQueue
{
public: typedef T ValueType; ...
private: typedef typename std::aligned_storage< sizeof(ValueType), std::alignment_of<ValueType>::value >::type StorageType; typedef std::array<StorageType, TSize> ArrayType; ArrayType array_; ...
};

When adding a new element to the queue, the “in-place” construction must be performed:

template <typename T, std::size_t TSize>
class StaticQueue
{
public: ... typedef T ValueType; ... template <typename U> void pushBack(U&& newElem) { auto* spacePtr = ...; // get pointer to the right place new (spacePtr) ValueType(std::forward<U>(newElem)); ... }
};

When an element removed from the queue, explicit destruction must be performed:

template <typename T, std::size_t TSize>
class StaticQueue
{
public: ... typedef T ValueType; ... void popBack() { auto* spacePtr = ...; // get pointer to the right place auto* elemPtr = reinterpret_cast<ValueType*>(spacePtr); elemPtr->~T(); // call the destructor; ... }
};

There is often a need to iterate over the elements of the queue. The standard sequential random access containers such as std::array, std::vector or std::deque may use a simple pointer (or a wrapper class around it) as iterator because address of every element is greater than address of its predecessor. Incrementing a pointer during the iteration would be enough to get an access to the next element. However, in circular queue/buffer there may be a case when address of the beginning of the queue is greater than address of the end of the queue:

Non linearised queue image

In this case having a simple pointer as iterator is not enough. There is a need to check a wrap-around case when incrementing an iterator. However always using this kind of iterator may incur undesired performance penalties. That is when “leniarisation” concept pops up. When the queue is linearised, address of every element is greater than the address of its predecessor and simple pointer (linearised iterator) may be used to iterate over all the elements in the queue:

Linearised queue image

When the queue is not linearised, it either must be linearised (may be a bit expensive, depending on the size of the queue) or iterate over all the elements in two stages: first on the first (top) part, then on the second (bottom) part. The StaticQueue implementation in embxx library provides two functions arrayOne() and arrayTwo() that return these two ranges.

However, there may be a need to read/write data from/to the queue without worrying about the wrap-around case. Good example of such case would be having such circular queue/buffer to contain data read from some communication interface, such as serial port, and there is a need to deserialise 4 byte value from this buffer. The most convenient way would be to use embxx::io::readBig<4>(iter) described previously. To properly support this case we will need to have a bit more expensive iterator that properly handles wrap-around when incremented and/or dereferenced. This is the reason for having two types of iterators for StaticQueue: LinearisedIterator and Iterator. The former is a simple typedef for a pointer which can be used only on the linearised part of the queue and the latter may be used when iterating without any knowledge whether there is a wrap-around case during the iteration.

When defining a new custom iterator class, there is a need to properly support std::iterator_traits for it. The traits are used to implement functions such as std::advance or std::distance). The requirement is to define the following internal types:

template <typename T, std::size_t TSize>
class StaticQueue
{
public: class Iterator { public: typedef std::random_access_iterator_tag iterator_category; typedef T value_type; typedef T* pointer; typedef T& reference; typedef typename std::iterator_traits<pointer>::difference_type difference_type; ... }; ...
};

Care must be taken when copying/moving elements between the queues. The compiler is not aware of the right type of the elements that are stored in the queue as well as number of valid elements in the queue is unknown at compile time. When using default copy/move constructor and/or assignment operator the compiler will generate a code that copies raw bytes in the storage space between the queues. It may work for the basic type or POD structs, but it is not the right way to do the copying. There is a need to use copy/move constructors in case of constructions or copy/move assignment operator in case of assignment of the valid elements and not copy/move garbage data from unused space.

In addition to regular copy/move constructors and assignment operators, there may also be a need to provide copy/move construction and/or copy/move assignment from the queue that contains elements of the same type, but has different capacity:

template <typename T, std::size_t TSize>
class StaticQueue
{
public: ... template <std::size_t TAnySize> StaticQueue(const StaticQueue<T, TAnySize>& queue) : Base(&array_[0], TSize) { ... // Copy all the elements from other queue } template <std::size_t TAnySize> StaticQueue(StaticQueue<T, TAnySize>&& queue) : Base(&array_[0], TSize) { ... // Move all the elements from other queue } template <std::size_t TAnySize> StaticQueue& operator=(const StaticQueue<T, TAnySize>& queue) { ... // Copy all the elements from other queueu } template <std::size_t TAnySize> StaticQueue& operator=(StaticQueue<T, TAnySize>&& queue) { ... // Move all the elements from other queue } ...
};

As we all know and confirmed in Templates chapter, any difference in the value of template parameter will create new instantiation of executable code. It means that having multiple queues of the same type, but different sizes may bloat the executable in an unacceptable way. The best way to solve this problem would be defining a base class that is templated only on the type of the stored values and implements the whole logic of the queue while the derived StaticQueue class will just provide the necessary storage area and reuse (wrap) all the functions implemented in the base class:

namespace details
{ template <typename T>
class StaticQueueBase
{
protected: typedef T ValueType; typedef typename std::aligned_storage< sizeof(ValueType), std::alignment_of<ValueType>::value >::type StorageType; typedef StorageType* StorageTypePtr; StaticQueueBase(StorageTypePtr data, std::size_t capacity) : data_(data), capacity_(capacity), startIdx_(0), count_(0) { } template <typename U> void pushBack(U&& value) {...} ... // All other API functions private: StorageTypePtr data_; // Pointer to storage area std::size_t capacity_; // Capacity of the storage area std::size_t startIdx_; // Index of the beginning of the queue std::size_t count_; // Number of elements in the queue
}; } // namespace details template <typename T, std::size_t TSize>
class StaticQueue : public details::StaticQueueBase<T>
{ typedef details::StaticQueueBaseOptimised<T> Base; typedef typename Base::StorageType StorageType; public: StaticQueue() : Base(&array_[0], TSize) { } template <typename U> void pushBack(U&& value) { Base::pushBack(std::forward<U>(value)); } ... // Wrap all other API functions private: typedef std::array<StorageType, TSize> ArrayType; ArrayType array_;
};

There are ways to optimise even more. Let’s take queues of int and unsigned values for example. They have the same size and from the queue implementation perspective there is no difference in handling them, so it would be a waste of code space to allow the instantiation of the same binary code for the queue to handle both of these types. Using template specialisation tricks we may implement queues of signed integral types to be a mere wrappers around queues that contain unsigned integral types. Additional example would be storage of the pointers to any types. It would be wise to specialise StaticQueue of pointers to be a wrapper around queue of void* pointers or even integral unsigned values of the same size as pointers (such as std::uint32_t on 32 bit architecture or std::uint64_t on 64 bit architecture).

Thanks to the template specialisation there are virtually no limits to optimisations we may apply. However I would like to remind you the well known saying “Premature optimisations are the root of all evil”. Please avoid optimising your StaticQueue implementation until the need arises.


Read the original article

Comments

  • By myrmidon 2026-03-1013:037 reply

    > There are multiple articles of how C++ is superior to C, that everything you can do in C you can do in C++ with a lot of extras, and that it should be used even with bare metal development

    An interesting perspective. Could turn it around as "everything you can do in C++ you can do in C with a lot less language complexity".

    My personal experience with low-level embedded code is that C++ is rarely all that helpful, tends to bait you into abstractions that don't really help, brings additional linker/compiler/toolchain complexity and often needs significant extra work because you can't really leverage it without building C++ abstractions over provided C-apis/register definitions.

    Would not generally recommend.

    • By jonathrg 2026-03-1013:111 reply

      You definitely need discipline to use C++ in embedded. There are exactly 2 features that come to mind, which makes it worth it for me: 1) replacing complex macros or duplicated code with simple templates, and 2) RAII for critical sections or other kinds of locks.

      • By kevin_thibedeau 2026-03-1018:37

        Consteval is great for generating lookup tables without external code generators. You can use floating point freely, cast the result to integers, and then not link any soft float code into the final binary.

    • By kryptiskt 2026-03-1014:593 reply

      > Could turn it around as "everything you can do in C++ you can do in C with a lot less language complexity".

      No, you can't, C is lacking a lot that C++ brings to the table. C++ has abstraction capabilities with generic programming and, dare I say it, OO that C has no substitute for. C++ has compile-time computation facilities that C has no substitute for.

      • By embeng4096 2026-03-1015:505 reply

        Is there an example of the generic programming that you've found useful?

        The extent of my experience has been being able to replace functions like convert_uint32_to_float and convert_uint32_to_int32 by using templates to something like convert_uint32<float>(input_value), and I didn't feel like I really got much value out of that.

        My team has also been using CRTP for static polymorphism, but I also feel like I haven't gotten much more value out of having e.g. a Thread base class and a derived class from that that implements a task function versus just writing a task function and passing it xTaskCreate (FreeRTOS) or tx_thread_create (ThreadX).

        Typed compile-time computation is nice, though, good point. constexpr and such versus untyped #define macros.

        • By csb6 2026-03-1017:06

          The generic algorithms that come with the C++ standard library are useful. Once you get used to using them you start to see that ad-hoc implementations of many of them get written repeatedly in most code. Since most of the algorithms work on plain arrays as well as more complex containers they are still useful in embedded environments.

        • By rkagerer 2026-03-112:18

          I had been programming for a long time before I learned OOP. After some years playing with it, I came to the conclusion there's not much I can't do about as well using simple functions and structs. The key is a well thought out and organized codebase. Always felt polymorphism in particular seemed more trouble than it was worth.

          I still use modern languages on a regular basis, but when I drop back to more basic languages there are only a few ergonomics that I truly miss (eg. generics).

        • By slaymaker1907 2026-03-1016:26

          std::array can sometimes give you the best of both worlds for stack allocation in that you statically constrain the stack allocation size (no alloca) while guaranteeing that your buffers are large enough for your data. You can also do a lot of powerful things with constexpr that are just not possible with arrays. It is very convenient for maintaining static mappings from enums to some other values.

        • By jonathrg 2026-03-1116:04

          Having a type safe generic ring buffer and such is nice

        • By CyberDildonics 2026-03-1019:02

          You've never used a template for a data structure and you've never used a destructor to free memory?

      • By myrmidon 2026-03-1016:08

        My point is trivially true as far as computability goes, but that is not what I ment.

        All those abstraction capabilities can be a big detriment to any project, because they always come with a cost, and runtime is far from the only concern.

        Specifically in an embedded project, toolchain complications and memory use (both RAM and code) are potentially much bigger concerns than for Desktop applications, and your selection of programmers is more limited as well; might be much more feasible to lock your developers onto acceptable C coding standards than to make e.g. "template metaprogramming" a necessary prerequisite for your codebase and then having to teach your applicants electrical engineering.

        Both object oriented programming and compile time computation is doable for a C codebase, just needs more boilerplate and maybe a code-generator step in your build, respectively. But that might well be an advantage, discouraging frivolous use of complexity that you don't actually need, and that introduces hidden costs (understanding, ease of change, compile time) elsewhere.

      • By bigfishrunning 2026-03-1019:57

        > C++ has compile-time computation facilities that C has no substitute for.

        The substitute for this is that C is insanely easy to generate. Do your compile time computation in your build system.

        OO is also pretty trivial in C -- the Linux kernel is a great example of a program in C that is very Object Oriented.

    • By Conscat 2026-03-1017:59

      > An interesting perspective. Could turn it around as "everything you can do in C++ you can do in C with a lot less language complexity".

      C can't parameterize an optimal layout fixed-length bitset with zero overhead, nor can it pragmatically union error sets at scale.

    • By bsoles 2026-03-112:181 reply

      I find that encapsulation of devices (UART, I2C, etc.) as classes in C++ rather than global functions that take structs, etc. as input arguments in C to be much more manageable. Same for device drivers for individual sensor ICs.

      • By myrmidon 2026-03-112:45

        Maybe, but I'd argue that this is a matter of taste, and what does it actually do for you?

        In practice: Is it worth wrapping all your vendor-provided hardware abstraction in classes that you write yourself? Designing something like that to be "complete" (i.e. allow full access to all hardware functionality like IRQ/DMA setup) and to still stay hardware-independent is often borderline impossible anyway, and any attempt is invariably going to make your codebase more difficult to understand.

    • By g947o 2026-03-1013:071 reply

      Mind if I ask whether you speak of that from a professional embedded system engineer's perspective?

      • By myrmidon 2026-03-1013:172 reply

        I do. But talking about low-level embedded stuff here.

        Generally, the more you deviate from your vendors "happy path", the more busy work/unexpected difficulties you will run into, and a solid grasp of how exactly architecture and toolchain work might become necessary (while staying on the "happy path" allows you to stay blissfully unaware).

        • By technothrasher 2026-03-1013:252 reply

          I struggle with this deviating from the vendor's "happy path" often. I mostly use the STM32 chips, and I don't particularly care for their HAL library. I find it over complicated and often has bugs in it that I have to track down and fix. But boy is it nice to use their STM32CubeMX program to generate all the low level code so I can just get to work. I tend to end up building my own low level libraries during my free time because I enjoy it and it gives me a better idea of how the hardware is actually working, but using the STM32 HAL library to write my actual client code at work.

          • By bsoles 2026-03-112:261 reply

            Also same experience here. I can write UART code with DMA in 20 lines of code on an STM32 microcontroller. Same functionality using HAL is astonishingly cumbersome. The reference manual and willingness to read it is all you need.

          • By patchnull 2026-03-1015:08

            [flagged]

        • By embeng4096 2026-03-1014:331 reply

          +1 to this and your above points (the embedded team I'm on has started using C++ over the last year or so).

          I've definitely learned a lot, and I like the portability of CMake for cross-platform use (our team uses all 3 of Windows, Mac, and Linux). My experience sounds much like yours: there've been a lot of times where using the vendor's flavor of Eclipse-based IDE (STM32CubeIDE, Renesas e2studio, etc) would have saved us a lot of discovered work, or extra work adapting the "happy path" to CMake and/or C++.

          Using C++ and/or CMake is fine when it's part of the happy path and for simpler things e.g. STM32CubeMX-generated CMake project + simple usage of HAL. For more complex things like including MCUboot or SBSFU, etc, it's forced me to really dig in. Or even just including FreeRTOS/ThreadX, we've created abstractions like a Thread class on top -- sometimes it's nice and convenient, others it feels like unnecessary complexity, but maybe I'm just not used to C++ yet.

          One clear, simple example is needing to figure out CMake and Ninja install. In an Eclipse-based IDE, you install the platform and everything Just Works(tm). I eventually landed on using scoop to install CMake and Ninja, which was an easy solution and didn't require editing my PATH, etc, but that wasn't the first thing I tried.

          • By adrian_b 2026-03-1020:261 reply

            I have never seen any advantage of CMake over the much simpler GNU make.

            Ninja is supposed to be faster for compiling very big software projects, but I have never seen an embedded software project that is well organized and which is not compiled in a few seconds on a powerful development computer with many cores, so I do not see the benefit of Ninja for such projects.

            All Eclipse-based IDEs that I have ever seen are extremely slow for anything, both for editing and for building a project and they make the simplest project management operations extremely complicated. Even Visual Code Studio is much faster and more convenient than using Eclipse-based IDEs. Other solutions can be much faster.

            While the example programs provided for STM32 MCUs are extremely useful for starting a project for them, I believe that using the methods of project building provided by the vendor results in a waste of time. I have always obtained better results and faster development by building a GNU toolchain (e.g. binutils,gcc,gdb, some libc) from scratch and by using universal GNU makefiles, which work for any CPU target and for any software project, with the customization of a few Make variables. I have written once a set of GNU Makefiles, according to its manual, around 1998, and I have never had to change them since then, regardless what platform I had as a target. For any new platform, there is just a small set of variables that must be changed by generating one per-platform included file, with things like the names of the compilers and other tools that must be invoked and their command-line options.

            For new projects, there is one very small file that must be generated for each binary file that must be built, which must contain the type of the file (e.g. executable, static library, shared library) and a list with one or more directories where source files should be searched. No changes are needed when source files are created, deleted, moved or renamed, and dependencies are identified automatically. I am always astonished when I see how many totally unnecessary complications exist in the majority of the project building configurations that I have ever seen provided by the vendors or in most open-source projects.

            • By chris_money202 2026-03-118:511 reply

              Just switched one of my teams firmware projects from make to cmake + ninja (with the help of GHCP). Build time went from 10 minutes to 2 minutes and now we have ability just to build what has changed.

              • By myrmidon 2026-03-119:111 reply

                You might have just had a bad makefile.

                The "canonical" way to just build what was changed with make is to let the compiler generate header dependencies during the build and to include those (=> if you change only 1 header, just the affected translation units get rebuilt). Works like a charm (and never messes up incremental builds by not rebuilding on a header-only change).

                If you did not have proper incremental builds before, I would blame the makefile specifically and not make itself.

                Another way to mess up with make is to not allow anything in parallel, either by missing flags (-j) or by having a rule that does everything sequentially.

                Ninja does have less overhead than make, but your build time should be dominated by how long compiler and linker take; such a big difference between the two indicates to me that your old make setup was broken in some way.

                • By chris_money202 2026-03-1113:43

                  It definitely was but that’s the issue with make, it allows for that sloppyness to enter the build system much easier. Cmake gives some rigidity and reuse to shield the casual dev from mucking up the build definition. You can probably do same things with make and cmake, it’s like a C vs C++ argument. And what I say to that is usually just do what you prefer to use and maintain

    • By chris_money202 2026-03-111:36

      With anything low level you should choose the language you’re most comfortable and competent in imo. Learning both the platform and the language is just asking for headaches, control what you can

    • By bigfishrunning 2026-03-1019:50

      I would honestly extend this sentiment out to all code. The benefits C++ has over C are much better served by Rust or Go or Python or Lisp, and if "Simple" is what you want, then C is a much better choice.

      Honestly, I can't think of a single job for which C++ is the best tool.

  • By embeng4096 2026-03-1014:391 reply

    I took a brief skim through so apologies if I missed that it was mentioned, but wanted to bring up the Embedded Template Library[0]. The (over)simplified concept is: it provides a statically-allocated subset (but large subset) of the C++ standard library for use in embedded systems. I used it recently in a C++ embedded project for statically-allocated container/list types, and for parsing strings, and the experience was nice.

    [0]: https://www.etlcpp.com/

    • By maldev 2026-03-1018:531 reply

      So I use C++ heavily in the kernel. But couldn't you just set your own allocator and a couple other things and achieve the same effect and use the actual C++ STL? In kernel land, at the risk of simplifying, you just implement allocators and deallocators and it "just works", even on c++ 26.

      • By nly 2026-03-1019:471 reply

        Do you typically just compile with -fno-rtti -fno-exceptions -nostdlib ?

        Last time I did embedded work this was basically all that was required.

        • By nulltrace 2026-03-1022:13

          Those three flags cover most of it. One gotcha: -fno-exceptions makes `new` return nullptr instead of throwing, so if any library code expects exceptions you get silent corruption. We added -fcheck-new to catch that.

          Also -nostdlib means no global constructors run, so static objects with nontrivial ctors need you to call __libc_init_array yourself.

  • By pjmlp 2026-03-1012:091 reply

    While tag dispatching used to be a widely used idiom in C++ development, it was a workaround for which nowadays there are much better alternatives with constexpr, and concepts.

    • By tialaramex 2026-03-1013:022 reply

      Surely one of the obvious reasons you'd want tagged dispatch in C++ isn't obviated by either of those features? Or am I missing something?

      Suppose Doodads can be constructed from a Foozle either with the Foozle Resigned or with the Foozle Submitted. Using tagged dispatch we make Resigned and Submitted types and the Doodad has two specialised constructors for the two types even though substantively we pass only the Foozle in both cases.

      In a language like Rust all the constructors have names, it's obvious what Vec::with_capacity does while you will still see C++ programmers who thought constructing a std::vector with a single integer parameter does the same because it's just a constructor and you'd need to memorize what happens.

      • By quuxplusone 2026-03-1013:31

        I wouldn't call the idiom you describe (like with unique_lock's defer_lock_t constructor) "tag dispatch"; to me, one defining characteristic of the "tag dispatch idiom" is that the tag you're dispatching on is computed somehow (e.g. by evaluating iterator_traits<T>::iterator_category()). The idiom you're describing, I'd call simply "a constructor overload set" that happens to use the names of "disambiguation tags" to distinguish semantically different constructors because — as you point out — C++ doesn't permit us to give distinct names to the constructor functions themselves.

        For more on disambiguation tags, see https://quuxplusone.github.io/blog/2025/12/03/tag-types/

        and https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p39...

      • By pjmlp 2026-03-1013:52

        You use if constexpr with requires expressions, to do poor man's reflection on where to dispatch, and eventually with C++26, you do it with proper reflection support.

HackerNews