On Wed, Jul 17, 2024, Joel Fernandes wrote: > On Tue, Jul 16, 2024 at 7:44 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Fri, Jul 12, 2024, Steven Rostedt wrote: > > > On Fri, 12 Jul 2024 09:44:16 -0700 > > > Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > > > All we need is a notifier that gets called at every VMEXIT. > > > > > > > > Why? The only argument I've seen for needing to hook VM-Exit is so that the > > > > host can speculatively boost the priority of the vCPU when deliverying an IRQ, > > > > but (a) I'm unconvinced that is necessary, i.e. that the vCPU needs to be boosted > > > > _before_ the guest IRQ handler is invoked and (b) it has almost no benefit on > > > > modern hardware that supports posted interrupts and IPI virtualization, i.e. for > > > > which there will be no VM-Exit. > > > > > > No. The speculatively boost was for something else, but slightly > > > related. I guess the ideal there was to have the interrupt coming in > > > boost the vCPU because the interrupt could be waking an RT task. It may > > > still be something needed, but that's not what I'm talking about here. > > > > > > The idea here is when an RT task is scheduled in on the guest, we want > > > to lazily boost it. As long as the vCPU is running on the CPU, we do > > > not need to do anything. If the RT task is scheduled for a very short > > > time, it should not need to call any hypercall. It would set the shared > > > memory to the new priority when the RT task is scheduled, and then put > > > back the lower priority when it is scheduled out and a SCHED_OTHER task > > > is scheduled in. > > > > > > Now if the vCPU gets preempted, it is this moment that we need the host > > > kernel to look at the current priority of the task thread running on > > > the vCPU. If it is an RT task, we need to boost the vCPU to that > > > priority, so that a lower priority host thread does not interrupt it. > > > > I got all that, but I still don't see any need to hook VM-Exit. If the vCPU gets > > preempted, the host scheduler is already getting "notified", otherwise the vCPU > > would still be scheduled in, i.e. wouldn't have been preempted. > > What you're saying is the scheduler should change the priority of the > vCPU thread dynamically. That's really not the job of the scheduler. > The user of the scheduler is what changes the priority of threads, not > the scheduler itself. No. If we go the proposed route[*] of adding a data structure that lets userspace and/or the guest express/adjust the task's priority, then the scheduler simply checks that data structure when querying the priority of a task. [*] https://lore.kernel.org/all/ZpFWfInsXQdPJC0V@xxxxxxxxxx