Tech

Earliest Eligible Virtual Deadline First: How the Linux Kernel Schedules Jobs

Rohan Arya Varma

12 May 2024 — 3 min read

In my graduate class on Operating Systems, we learnt about the ins and outs of the Linux kernel, or at least we did so as much we could over thirteen weeks of lectures. OS isn't really isn't my strong suit, but elegant solutions to complex problems never fail to captivate me, and process scheduling in the current version of the Linux kernel is one such example.

Earliest Eligible Virtual Deadline First (EEVDF) is a scheduling algorithm that has been adopted by the kernel in v6.6 (Oct 2023), even though the paper introducing it was published in 1995. Ideally, we want any scheduling algorithm to be fair, that is, to fairly allocate computing resources to all tasks. We don't want a situation where either high or low priority tasks, or short or long tasks dominate the CPU and cause the starvation of others. EEVDF is a solution that results in fairness while also respecting task priority.

Eligibility and Lag

The eligibility of a task is determined by a simple comparison: the lag of the process must be more than or equal to zero, where lag is the difference between how long the task has run for and how long the task should have run for. Each task has a "weight" that directly corresponds to priority (or in kernel terms, "niceness"). Higher priority tasks (i.e., those with a lower niceness) have higher weights. How long a given task should have run for at a given time is determined by calculated the average share of the weight of the task from the time the task was submitted to the the given time. If a job has a larger weight, then it occupies a larger share of the total weight of all tasks over time, and hence it should get to run for a longer period of time.

However, calculating this integral is impossible due to changes in the size of the task queue and computational limitations. The concept of virtual time side steps some of these issues and and helps us derive a few results that, after certain approximations and assumptions, makes determining whether a task is eligible or not practically viable.

Before that, two more terms need to be covered. The eligible time for a task is the time at which the duration it has run for is the same as the duration it should have run for. From this point onward, given that the task has not been run, it is eligible to be run. Once a task is eligible, given that it is requesting a service time of r, the deadline of the task is the time where the task should have received a service time of r after being eligible.

Virtual Time

Virtual time is the the ratio of how long any task should run for at a given real time to the weight of the task. Virtual time is, in a sense, time slowed down according to the total weight.

Interesting mathematics follows, which I won't get into here, but now we have a virtual eligible time and a virtual deadline, which correspond to the eligible time and deadline mentioned above but in virtual time. Most importantly, the virtual deadline is the sum of the virtual eligible time and the service time requested scaled by the weight of a task. That is, time flows more slowly for higher priority tasks. If there's anything you take away from this article, it should be this.

Implementation in Kernel

EEVDF employs fixed time slices. This means it does not scale the time it allows access to the CPU for any given task. If a task needs more time, it will simply be scheduled another time slice once its current one expires. More interesting mathematics follows, but, practically, a task is considered eligible if its own virtual runtime is lower than the average virtual runtime of all tasks.

The virtual runtime for a task is updated by incrementing it with a virtualized time slice, that is, a time slice scaled according to the weight of the task, when it is scheduled the fixed time slice by the kernel. The virtual deadline, then, is simplified to the sum of the virtual runtime of the task and this virtualized time slice.

Earliest Eligible Virtual Deadline

All tasks in the task queue are stored in a red-black tree (a form of balanced binary search tree) indexed on virtual runtime, where each node is augmented with the minimum virtual deadline of the subtree it's the root of. First, the kernel will search for the first eligible node by going from the root leftwards. Once an eligible node is obtained, the kernel then searches for the task that has the earliest virtual deadline in the subtree. The eligible node with the deadline equal to the minimum deadline of the subtree is the one selected for execution, finally explaining the name of the algorithm. This search takes place in O(log n) time.

And there we go! This is the new kernel scheduler algorithm in a nutshell, leaving out a lot of the math and the assumptions needed to make it work. I'm sure that work is still ongoing, and it's exciting to see how the implementation will change in the future.

Now that I am done with this semester, I'm hopefully able to come back to writing more. Thanks for sticking along!