The Device Drivers – RTOS Vs. Linux

Typical RTOS device driver model

Let’s take a quick look at a typical RTOS device driver and throw it in ring with Linux driver.

The device driver’s philosophy
From a general perspective, the device driver, usually is a set of routines enabling typical applications to talk to the hardware. In other words, often, the device drivers are translators, translating applications language into hardware language. The device drivers also enable application programs to ignore hardware dependencies and focus on the actual job in hands.

The RTOS device driver
The device drivers for RTOS, as shown in the diagram, usually a bit different from a GPOS or firmware device drivers. The device drivers can be broadly classified in to three types, OS less drivers, GPOS drivers and RTOS drivers.

OS less drivers usually are just a bunch of routines enabling applications without OS to do their job. Usually such applications are bootloaders, firmware boot up routines or simple systems with round robbin applications.

The device drivers for GPOS, for the most part, are also a bunch of device access routines. However, the key difference between drivers for OS less systems and drivers for GPOS systems is lying in the driver operation concurrancy, entry & exit points. More frequently, if not always, the drivers should also deal with security stuff.

However, the RTOS drivers themselves make their home in an entirely different world. The drivers should follow the same basic principles and the philosophy of the RTOS itself. The driver execution should be predictable and also should enable the whole system to be predictable and fault tolerant.

Let’s consider some scenario to picture the above two phrases. When we open a file in, say the Windows explorer, from a DVD ROM. The read encoutnered some error, causing the hardware & software to enter a retry loop. From the explorer, under normal scenario, one cannot end that loop, by a key stroke or even from task manager. Under Linux, the only way to end that loop, only the software loop, is to kill the process. However, the driver still continues to retry till the given number of retries are exhausted and only then it tries to return error code to the process, only to know that the process was long gone.

Under a typical RTOS scenario, the driver starts executing the reads and enters the failure loop. The difference starts here. Both the driver and the application start the operations with predefined timeout values. Hence, when a command is issued, say by the driver to the hardware, the driver waits for, say, the ISR, the driver waits for a timeout period. Hence, if the driver did not get response from the hardware within given time, the driver can choose to return error to the application indicating it needs more time to do the job.

On the other hand, the drivers for most monolithic operating systems or GPOS’ are mostly a bunch or routines with data structures tied to the calling process or a file descriptor’s context. Hence, the executing driver code contends for the hardware at the lowest level in the driver code if executed by multiple processes and the first one wins, while others enter sleep on the hardware lock. Thus, the calling process, if it did not get the lock, has to sleep/block. In some cases, the driver can also export a little framework to return immediately registering the request and respond later with the data, e.g. async file IO. Though, this allows the application process to be a bit more predictable at the execution, the application still lacks complete predictability. However, the driver is a lot more scalable with high peformance on the more capable hardware. Some other topic on that again.

The RTOS driver, on the other hand, for the most part runs it’s own thread managing the requests. A typical driver implementation is depicted by the picture above. Let’s see what makes this driver under RTOS more predictable.

The driver thread maintains a message queue. The application thread sends a request message along with it’s own queue which should receive the response. The request message contains the necessary data for the driver to execute the request, say read location or write pixel/framebuffer. Thus, once the application sends the request, it just registered it’s request with the driver and is free to do other stuff till it gets it’s response back. Now, the application can sleep on the response queue with a timeout. Thus, the CPU is free to do other jobs instead of either continuously polling the queue or waiting indefinitely. If the application is back in execution and found that it did not receive the response from the driver yet, it may choose to do some other stuff, such as informing the user or taking other corrective actions.

What makes the driver predictable is not much different from the above scenario. The driver picks up the first request form the message queue and starts executing the job. It issues the request to the hardware and waits for the response, on a semaphore with timeout, shared between driver and ISR. The hardware, after the job is done, responds by raising the interrrupt causing the CPU to start the ISR. The ISR in turn releases the semaphore, thus waking the driver process up again. The driver process, then, responds with the result to the application. If there was an error, the hardware did not respond in time back. The driver wakes up because of timeout error, figuring out that the hardware failed to do the job in time. Thus, the driver responds with an error to the application and issues an abort to the hardware for the current job. The application, can then choose whether to retry or not, by sending another request again. However, for occassional realtime systems, the driver tries itself for a couple of times before giving up. The application, upon deciding to abort the curret operation, issues a high priority message to the driver process, causing it to process the one immediately, like a backdoor, thus aborting the current operation.

The facilities of priorities, such as task priorities and timeouts against sleeping contexts such as semaphore wait enable the designer to design a system, highly predictable. Often, the life support systems are some real time systems with redundancies built in. That means, if a hardware fails to do it’s job in time, the driver immediately turns the device off bringing the system to a known state and picks the secondary hardware to do the job. The same is the case with mission critical systems, be the nuclear control systems or avionics. As it shows, handling redundancies is another speciality for a RTOS device driver.

However, the message queue and the serializaiton of the code with priorities often tend to limit the scalability of the systems.

For high performance hardware which may tend to take in multiple commands at once and respond with results one by one such as DMA controllers, instead of releasing a semaphore, the ISR sends a message to the same driver queue with results read from the hardware. The driver, when gets a chance to process the response message, looks at the context stored by the ISR and figures out the message from application, thus responding with appropriate result.

That closes the topic for RTOS drivers for now. I’ll try to throw some light  at tasks and task priorities some time.


Hardware with RTOS & Device Drivers

Let’s look at how the hardware lives with RTOS.


The hardware from, at least from the RTOS perspective, can be broadly classified of two types. The first part is what is used by the RTOS, the second is what RTOS supports the applications with.

The fundamental hardware required for a given RTOS is CPU, Memory and some boot store, the red blocks. Most RISC CPUs often tend to have a small static memory on the die, about a few KB on board. Though a given RTOS can be implemented using just the internal memory,  the RTOS still needs to manage the memory resources for itself, house-keeping and of course the applications. Hence, the external memory is also a required hardware resource, though it is not something it cannot do without.

The additional hardware, or the ‘application’ hardware, the blue blocks, are not used by the RTOS itself, but they should have drivers developed and run on top of RTOS kernel itself to make up the BSP or Board Support Package. Unless all the four blocks are effectively covered by the RTOS and the BSP, an application cannot use the complete system effectively.

Next, let’s look at the boot up process till we bring up the applications, which should help us cover all the hardware blocks.


Let’s start with the Power On Self Test, a.k.a POST and pre boot phase. These routines must be stored in a permanant non voaltile storage, usually ROM or Flash. The post and pre boot phase normally begins with the execution jumping to the reset vector. In other words, the POST and pre boot stuff live at the reset vector. If you got mail for them, you know where to deliver now.

Alright, fun aside. The POST, as it abbreviates, are a set of routines helping the processor to verify that the coast is clear for it to boot. Once the processor gets a go from the POST, the pre boot kicks in. Usually, the pre boot up routines are packaged and shipped along with POST and most probably the boot loader itself.

The pre boot routines setup minimal environment for the boot loader to take over. At the minimum, they either initialize the internal memory, if available. Otherwise, they have to enable the external memory controller, initialize the external memory itself, e.g SDRAM. Once the memory is initialized, the pre boot routines give way to the boot loader. Occassionally, the pre boot routines relocate the boot loader from the permanant storage to RAM.

Ah, the big guy, the boot loader. Usually, the boot loader has got more than one job to do. The boot loaders are often designed to do a bit more than just setting up the memory space, configuring the resources, loading the kernel and OS into the memory and passing the control to it. Often, boot loaders are also designed to download the OS images, verify the filesystems, uncompress the OS images or even give an early access to board diagnostics. In any case, let’s just follow the regular boot path.

Once the boot loader is in control, it sets up a bit more of the hardware, configures the resources and address space, fiddle with caching and starts unpacking the OS image. In some cases, the OS image may need a bit of uncompression, in other it’s a good to go off the shelf. Usually, the OS image has it’s home in the flash store.

In cases, there could be two flash stores, one a bit of high reliability parallel flash at the reset vector and another high capacity serial or parallel flash living somewhere else. May even be a NAND flash or hard disk also. In such cases, the boot loader should also know a bit about how they are stored in the secondary store, e.g. filesystem and drivers to access the media.

If the boot loader store is separated, it could be because of multiple reasons. One, the flash store large enough for entire OS image could be really costly, because it has to be high reliability NOR type. Two, OS upgrades may impact the stability of the boot loaders also. Another, in case of corruption by some OS writes, the bootloader may be overwritten, corrupting the same. A serial flash store such as SPI flash or even SD cards cannot provide parallel access during POST phases.

On with the tour, once the boot loader gets hold of OS image, it may unpack the same in to RAM, uncompress if needed. Some smaller systems may directly run OS off the flash. In such cases, boot loader simply passes on the control to the OS image after checking for any OS update events pending. Otherwise, the loading into the RAM goes first and then control is passed to the OS boot up routines.

Once the OS gets control, the boot up part of the kernel routines take up the charge. They verify internal resources, setup address spaces etc. Occassionally, an uncompressing job is handled by the pre-kernel part itself. The pre-kernel may also set the stack up for the kernel, heap and other minimal memory management stuff. Usually, the pre-kernel messes with only a minor part of the avaialble RAM, leaving rest for the big guys. Finally, the kernel kicks in.

The kernel does a bit of housekeeping work, such as reconfiguring the memory regions, brings up the memory management and resource management stuff such as traps, task queues, timer queues etc. Once the kernel is done with the housekeeping work, it will finally enter the scheduling routine, from where it will load the task whose address is given to it at the shipping time and starts the same. A simple mechanism is a pre-defined function, with resources also predefined. In a way, we may call this the init task.

Usually, timer routines are provided by the BSP to the kernel. Just before starting off the first task, the timer is kicked, ISRs are setup and the control is passed to the init task. This way, the RTOS can flex it’s time slicing muscle at tasks that refuse to give the CPU up.

The init task, by design, spawns a bunch of tasks. The ones that go in first are the device driver tasks. If you are familiar with *nix, you may ask, what does this mean? Let’s look at the RTOS drivers, a bit later, to understand what they really are.

Once the driver tasks are up and running, the application tasks are brought in. These are the guys who do the final job. Be it showing up some video on the screen or issuing a defibrillating shock to the heart or even launching an atom bomb, depending upon the hardware resources and what they are programmed to do.

One thing to notice in the context of RTOS applications is that they directly work on the hardware, with drivers are just a bunch of synchronized routines for doing the dirty job of working with the hardware. In other words, the RTOS applications, for the most part, should be aware of the harware to do their job. There are a bit of exceptions, but most real time tasks, do their job, of course, both As Simple As Possible and As Quick As Possible.

Though some POSIX and other library implementations exist on a given RTOS, for the most part, at least for time critical jobs, the task should know what it is dealing with. Abstractions are distractions.

Coming up next: A typical RTOS device driver.

Components of RTOS

(Or how can RTOS do what it does)

Let’s look at a typical Real Time system implementation.


The RTOS component of the real time system is just the micro kernel part, the yellow region. The RTOS just exports a set of basic components as part of the OS, a priority based task scheduler, a minimalistic memory management subsystem, a set of timers, some task synchronization facilities like semaphores and conditional variables and any other such OS services. The timers and synchronization facilities trigger the scheduler almost everytime. If time slicing is implemented, timer also issues a regular tick which causes currently running tasks to be preempted and next higher priority task to be rescheduled.

As one can notice, the memory space is linear, often physical address space. The execution domain is single, no user – kernel modes of execution. The device drivers are just tasks or threads of execution running at higher priority, as can be seen above. From the RTOS perspective, tasks are often just threads of execution.

Let’s try to look at each of the subsystem and components.

The scheduler is usually a simple round robbin scheduler with priorities. Usually, the scheduler maintains a set of task queues, number depending upon the number of priorities supported. If a system supports 16 priorities for the tasks, the scheduler usually maintains multiples of 16 task queues, each task is queued in one of it’s priority queues. The task queues might consist, at least, ready to run and sleeping. Some relatively simple OS may choose to have single queue for both ready to run and sleeping tasks. Even simpler systems support very minimal priorities, e.g. 4 priorities, maintain a single queue for all the tasks in all states. However, such systems should have very minimal number of tasks, otherwise the scheduler may end up eating all the CPU cycles.

Almost every kernel subsystem invokes the scheduler. A timer event, a semaphore or mutex release and in some cases even an ISR return can invoke the scheduler. The scheduler scans for the next highest priority task to be run. If a new higher priority task came up in running queue, the current task is put to sleep and the newer one starts executing. The scheduler can be locked out by the tasks running critical code, but any use of such facility should be reviewed and reviewed again with every caution. Otherwise, the system may enter deadlock, behavior may become unpredictable or the system may fail to meet the requirements.

If the time slicing is configured and enabled, even the tasks running at the same priority may be preempted to run next task in the same priority queue. At the timer event for time sclicing, the current task is appended at the end of the current priority run queue and the next task is picked up, only after ensuring no higher priority tasks are ready to run.

The task priorities are one of the serious issues which tend to give nightmares to the designers. However, a system with properly decided task priorities runs almost bullet proof. I personally witnessed unstabilities during the development of real time systems without proper calculated/assigned priorities, till the task priorities achieve their perfect balance. From that point, the development, testing and deployment is just a walk in the park.

The memory manager, often, tends to be a simple slab allocator and collector. It is designed to be as simple as possible. Most often, a typical RTOS tends to have a linear physical address space exported to all of it’s tasks. No segmentation, no logical addressing or virtual addressing and paging. Some RTOS tend to provide such facilities, though the tasks utilizing such facilities may be guaranteed to meet real time performance goals. The address space or memory for the memory manager is allocated at the compile time itself. Often, the stack get’s it’s own space allocation and the heap it’s own. In some cases, the stack for a new task is also allocated from the heap and in such cases, the memory manager might need to do a little extra job.

Timer are one integral part for any given RTOS. The timers, in my view, are the heart beat for the RTOS. Without timer events, achieving real time performance is really hard, though not impossible.

Timer subsystem can be broadly divided in to two parts, the timer event generation part and the timer queues. The timer event generation part, usually, depends on the timer device driver for the specific platform. Once the system is up and running, the timer events are genenrated as configured during the compile time. Timer events, often in milliseconds resolution, can either invoke the scheduler or the timer queue management part. If the timer queue management part is invoked, all the timer queues are scanned for the events that need to be shuffled, e.g. task sleeping on a condition, the tasks that have expired timers are added to the respective priority run queue and the scheduler is invoked. Often, different hardware timers are used for time slicing and timer queue management.

The timer subsystem can also cause event posting to the tasks, that a given task can subscribe to. A task can subscribe to timer events at a perticular interval, as per compile time configuration, and can receieve timer events at the configured intervals. Software watchdog systems are one best example. A software watchdog can kill and restart a task or even reset a system if it didn’t receive a predefined event within given time from the given task or subsystem.

The RTOS may also export other subsystems such as system management, interrupt management, libraries for application and even POSIX compliant libraries. Some RTOS may even come with predefined tasks such as for networking or TCP/IP stack and filesystem management tasks.

Let’s look at device drivers, hardware interaction and user task related stuff in the next post.

Real Time Operating Systems – A view from the shore

The Real Time Operating System from a common man’s perspective.

Computing systems can broadly be classified in to Generic computing systems, Embedded computing systems and High performance computing systems. The line  between generic computing systems such as desktop computing systems and the embedded computing systems is getting blurred day by day.

An operating system functions to be the mind of a given computing system. It creates and manages jobs, it deals with allocating and managing resources and much more. Often operating systems come bundled with user interfaces and programmer friendly libraries.

While desktop operating systems can be complex monsters, embedded operating systems usually tend to be much simpler and lighter. As modern embedded systems became much more powerful than earlier general purpose computing systems, the desktop operating systems such as Windows and Linux are finding their way in to embedded computing space, mostly in their miniature forms.

The embedded operating systems can be classified between real-time (RTOS) and non real-time systems. The non real-time systems can further be divided in to two further classes, round-robbin simple kernels and general purpose OS (GPOS) for embedded systems.

The RTOS is the key component in implementing systems with predictable responses. Imagine a mobile processing system, which has a very limited resources such as slow processor and often little RAM and storage facilities. Now, if one does not have a way to implement a predictable behavior software on top of the hardware, the resources are pretty small compred to the job they have to do: process the call data, process the control information, play a song or shoot a picture.

However, RTOS has it’s own set of limitations. First, RTOS cannot, at least at the heart, enforce security such as user/kernel space. Hence, typical RTOS systems cannot run user programs or third party applications. Though there are some exceptions, they run at low priority and consume extra resources. In either case, the application space is limited. Let’s look at that in some other post.

Besides, adding any more tasks can easily change the predictable behavior or the overall system. However, in most cases, the impacted are only the tasks below the new task’s priority. However, since the system is designed to be specific purpose system, complex interdependancies change their equations changing the overall behavior of the system.

On the other hand, a General Purpose Operating System or GPOS, does not have these or other such limitations, which RTOS has. However, GPOS cannot guarantee a predictable execution environment with constrained resources.

Let’s consider the case of a slower PC with Windows or Linux operating system. If you open a file or open the browser while you’re playing a video or even a song in the background, the video or the song will be interrupted for brief durations. In order to be able to run such multiple tasks smooth, the PC should have minimum resources, while a mobile can easily handle such tasks with even minimal resources.

The advantage with embedded GPOS is that it can allow user to build and deploy third party applications without worrying about security or other such issues. That might be the convincing reason behind mulitple processors, often, found in the smart phones.

Let’s take a quick look at RTOS components in the next post.

%d bloggers like this: