Re: [RFC PATCH v2 0/5] Paravirt Scheduling (Dynamic vcpu priority management)

Steven Rostedt <rostedt@xxxxxxxxxxx> · Tue, 16 Jul 2024 20:13:26 -0400

On Tue, 16 Jul 2024 16:44:05 -0700
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > 
> > Now if the vCPU gets preempted, it is this moment that we need the host
> > kernel to look at the current priority of the task thread running on
> > the vCPU. If it is an RT task, we need to boost the vCPU to that
> > priority, so that a lower priority host thread does not interrupt it.  
> 
> I got all that, but I still don't see any need to hook VM-Exit.  If the vCPU gets
> preempted, the host scheduler is already getting "notified", otherwise the vCPU
> would still be scheduled in, i.e. wouldn't have been preempted.

The guest wants to lazily up its priority when needed. So, it changes its
priority on this shared memory, but the host doesn't know about the raised
priority, and decides to preempt it (where it would not if it knew the
priority was raised). Then it exits into the host via VMEXIT. When else is
the host going to know of this priority changed?

> 
> > The host should also set a bit in the shared memory to tell the guest
> > that it was boosted. Then when the vCPU schedules a lower priority task
> > than what is in shared memory, and the bit is set that tells the guest
> > the host boosted the vCPU, it needs to make a hypercall to tell the
> > host that it can lower its priority again.  
> 
> Which again doesn't _need_ a dedicated/manual VM-Exit.  E.g. why force the host
> to reasses the priority instead of simply waiting until the next reschedule?  If
> the host is running tickless, then presumably there is a scheduling entity running
> on a different pCPU, i.e. that can react to vCPU priority changes without needing
> a VM-Exit.

This is done in a shared memory location. The guest can raise and lower its
priority via writing into the shared memory. It may raise and lower it back
without the host ever knowing. No hypercall needed.

But if it raises its priority, and the host decides to schedule it because
the host is unaware of its raised priority, it will preempt it. Then when
it exits into the host (via VMEXIT) this is the first time the host will
know that its priority was raised, and then we can call something like
rt_mutex_setprio() to lazily change its priority. It would then also set a
bit to inform the guest that the host knows of the change, and when the
guest lowers its priority, it will now need to make a hypercall to tell the
kernel its priority is low again, and it's OK to preempt it normally.

This is similar to how some architectures do lazy irq disabling. Where they
only set some memory that says interrupts are disabled. But interrupts only
get disabled if an interrupt goes off and the code sees it's "soft
disabled", and then will disable interrupts. When the interrupts are
enabled again, it then calls the interrupt handler.

What are you suggesting to do for this fast way of increasing and
decreasing the priority of tasks?

-- Steve