On Tue, 16 Jul 2024 16:44:05 -0700 Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > Now if the vCPU gets preempted, it is this moment that we need the host > > kernel to look at the current priority of the task thread running on > > the vCPU. If it is an RT task, we need to boost the vCPU to that > > priority, so that a lower priority host thread does not interrupt it. > > I got all that, but I still don't see any need to hook VM-Exit. If the vCPU gets > preempted, the host scheduler is already getting "notified", otherwise the vCPU > would still be scheduled in, i.e. wouldn't have been preempted. The guest wants to lazily up its priority when needed. So, it changes its priority on this shared memory, but the host doesn't know about the raised priority, and decides to preempt it (where it would not if it knew the priority was raised). Then it exits into the host via VMEXIT. When else is the host going to know of this priority changed? > > > The host should also set a bit in the shared memory to tell the guest > > that it was boosted. Then when the vCPU schedules a lower priority task > > than what is in shared memory, and the bit is set that tells the guest > > the host boosted the vCPU, it needs to make a hypercall to tell the > > host that it can lower its priority again. > > Which again doesn't _need_ a dedicated/manual VM-Exit. E.g. why force the host > to reasses the priority instead of simply waiting until the next reschedule? If > the host is running tickless, then presumably there is a scheduling entity running > on a different pCPU, i.e. that can react to vCPU priority changes without needing > a VM-Exit. This is done in a shared memory location. The guest can raise and lower its priority via writing into the shared memory. It may raise and lower it back without the host ever knowing. No hypercall needed. But if it raises its priority, and the host decides to schedule it because the host is unaware of its raised priority, it will preempt it. Then when it exits into the host (via VMEXIT) this is the first time the host will know that its priority was raised, and then we can call something like rt_mutex_setprio() to lazily change its priority. It would then also set a bit to inform the guest that the host knows of the change, and when the guest lowers its priority, it will now need to make a hypercall to tell the kernel its priority is low again, and it's OK to preempt it normally. This is similar to how some architectures do lazy irq disabling. Where they only set some memory that says interrupts are disabled. But interrupts only get disabled if an interrupt goes off and the code sees it's "soft disabled", and then will disable interrupts. When the interrupts are enabled again, it then calls the interrupt handler. What are you suggesting to do for this fast way of increasing and decreasing the priority of tasks? -- Steve