On Thu, Jul 18, 2024 at 12:20 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > On Wed, 17 Jul 2024 10:52:33 -0400 > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > > We could possibly add a new sched class that has a dynamic priority. > > It wouldn't need to be a new sched class. This could work with just a > task_struct flag. > > It would only need to be checked in pick_next_task() and > try_to_wake_up(). It would require that the shared memory has to be > allocated by the host kernel and always present (unlike rseq). But this > coming from a virtio device driver, that shouldn't be a problem. > > If this flag is set on current, then the first thing that > pick_next_task() should do is to see if it needs to change current's > priority and policy (via a callback to the driver). And then it can > decide what task to pick, as if current was boosted, it could very well > be the next task again. > > In try_to_wake_up(), if the task waking up has this flag set, it could > boost it via an option set by the virtio device. This would allow it to > preempt the current process if necessary and get on the CPU. Then the > guest would be require to lower its priority if it the boost was not > needed. > > Hmm, this could work. For what it's worth, I proposed something somewhat conceptually similar before: https://lore.kernel.org/kvm/CABCjUKBXCFO4-cXAUdbYEKMz4VyvZ5hD-1yP9H7S7eL8XsqO-g@xxxxxxxxxxxxxx/T/ Guests VCPUs would report their preempt_count to the host and the host would use that to try not to preempt a VCPU that was in a critical section (with some simple safeguards in case the guest was not well behaved). (It worked by adding a "may_preempt" notifier that would get called in schedule(), whose return value would determine whether we'd try to schedule away from current or not.) It was VM specific, but the same idea could be made to work for generic userspace tasks. -- Suleiman