Re: [RFC PATCH 0/8] Dynamic vcpu priority management in kvm

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Wed, 24 Jan 2024 20:08:56 -0500

Hi David,

On Wed, Jan 24, 2024 at 12:06 PM David Vernet <void@xxxxxxxxxxxxx> wrote:
>
[...]
> > There might be a caveat to the unboosting path though needing a hypercall and I
> > need to check with Vineeth on his latest code whether it needs a hypercall, but
> > we could probably figure that out. In the latest design, one thing I know is
> > that we just have to force a VMEXIT for both boosting and unboosting. Well for
> > boosting, the VMEXIT just happens automatically due to vCPU preemption, but for
> > unboosting it may not.
>
> As mentioned above, I think we'd need to add UAPI for setting state from
> the guest scheduler, even if we didn't use a hypercall to induce a
> VMEXIT, right?

I see what you mean now. I'll think more about it. The immediate
thought is to load BPF programs to trigger at appropriate points in
the guest. For instance, we already have tracepoints for preemption
disabling. I added that upstream like 8 years ago or something. And
sched_switch already knows when we switch to RT, which we could
leverage in the guest. The BPF program would set some shared memory
state in whatever format it desires, when it runs is what I'm
envisioning.

By the way, one crazy idea about loading BPF programs into a guest..
Maybe KVM can pass along the BPF programs to be loaded to the guest?
The VMM can do that. The nice thing there is only the host would be
the only responsible for the BPF programs. I am not sure if that makes
sense, so please let me know what you think. I guess the VMM should
also be passing additional metadata, like which tracepoints to hook
to, in the guest, etc.

> > In any case, can we not just force a VMEXIT from relevant path within the guest,
> > again using a BPF program? I don't know what the BPF prog to do that would look
> > like, but I was envisioning we would call a BPF prog from within a guest if
> > needed at relevant point (example, return to guest userspace).
>
> I agree it would be useful to have a kfunc that could be used to force a
> VMEXIT if we e.g. need to trigger a resched or something. In general
> that seems like a pretty reasonable building block for something like
> this. I expect there are use cases where doing everything async would be
> useful as well. We'll have to see what works well in experimentation.

Sure.

> > >> Still there is a lot of merit to sharing memory with BPF and let BPF decide
> > >> the format of the shared memory, than baking it into the kernel... so thanks
> > >> for bringing this up! Lets talk more about it... Oh, and there's my LSFMMBPF
> > >> invitiation request ;-) ;-).
> > >
> > > Discussing this BPF feature at LSFMMBPF is a great idea -- I'll submit a
> > > proposal for it and cc you. I looked and couldn't seem to find the
> > > thread for your LSFMMBPF proposal. Would you mind please sending a link?
> >
> > I actually have not even submitted one for LSFMM but my management is supportive
> > of my visit. Do you want to go ahead and submit one with all of us included in
> > the proposal? And I am again sorry for the late reply and hopefully we did not
> > miss any deadlines. Also on related note, there is interest in sched_ext for
>
> I see that you submitted a proposal in [2] yesterday. Thanks for writing
> it up, it looks great and I'll comment on that thread adding a +1 for
> the discussion.
>
> [2]: https://lore.kernel.org/all/653c2448-614e-48d6-af31-c5920d688f3e@xxxxxxxxxxxxxxxxx/
>
> No worries at all about the reply latency. Thank you for being so open
> to discussing different approaches, and for driving the discussion. I
> think this could be a very powerful feature for the kernel so I'm
> pretty excited to further flesh out the design and figure out what makes
> the most sense here.

Great!

> > As mentioned above, for boosting, there is no hypercall. The VMEXIT is induced
> > by host preemption.
>
> I expect I am indeed missing something then, as mentioned above. VMEXIT
> aside, we still need some UAPI for the shared structure between the
> guest and host where the guest indicates its need for boosting, no?

Yes you are right, it is more clear now what you were referring to
with UAPI. I think we need figure that issue out. But if we can make
the VMM load BPF programs, then the host can completely decide how to
structure the shared memory.

> > > 2. What is the cost we're imposing on users if we force paravirt to be
> > >    done through BPF? Is this prohibitively high?
> > >
> > > There is certainly a nonzero cost. As you pointed out, right now Android
> > > apparently doesn't use much BPF, and adding the requisite logic to use
> > > and manage BPF programs is not insigificant.
> > >
> > > Is that cost prohibitively high? I would say no. BPF should be fully
> > > supported on aarch64 at this point, so it's really a user space problem.
> > > Managing the system is what user space does best, and many other
> > > ecosystems have managed to integrate BPF to great effect. So while the
> > > cost is cetainly nonzero, I think there's a reasonable argument to be
> > > made that it's not prohibitively high.
> >
> > Yes, I think it is doable.
> >
> > Glad to be able to finally reply, and I shall prioritize this thread more on my
> > side moving forward.
>
> Thanks for your detailed reply, and happy belated birthday :-)

Thank you!!! :-)

 - Joel