Re: RFC: A KVM-specific alternative to UserfaultFD

Oliver Upton <oliver.upton@xxxxxxxxx> · Wed, 8 Nov 2023 01:27:12 +0000

On Tue, Nov 07, 2023 at 01:34:34PM -0800, David Matlack wrote:
> On Tue, Nov 7, 2023 at 1:10 PM Oliver Upton <oliver.upton@xxxxxxxxx> wrote:
> Thanks Oliver. Maybe I'm being dense but I'm still not understanding
> how VGIC and UFFD interact :). I understand that VGIC is unaware of
> UFFD, but fundamentally they must interact in some way during
> post-copy. Can you spell out the sequence of events?

Well it doesn't help that my abbreviated explanation glosses over some
details. So here's the verbose explanation, and I'm sure Marc will have
a set of corrections too :) I meant there's no _explicit_ interaction
between UFFD and the various bits of GIC that need to touch guest
memory.

The GIC redistributors contain a set of MMIO registers that are
accessible through the KVM_GET_DEVICE_ATTR and KVM_SET_DEVICE_ATTR
ioctls. Writes to these are reflected directly into the KVM
representation, no biggie there.

One of the registers (GICR_PENDBASER) is a pointer to guest memory,
containing a bitmap of pending LPIs managed by the redistributor. The
ITS takes this to the extreme, as it is effectively a bunch of page
tables for interrupts. All of this state actually lives in a KVM
representation, and is only flushed out to guest memory when explicitly
told to do so by userspace.

On the target, we reread all the info when rebuilding interrupt
translations when userspace calls KVM_DEV_ARM_ITS_RESTORE_TABLES. All of
these guest memory accesses go through kvm_read_guest() and I expect the
usual UFFD handling for non-present pages kicks in from there.

> >
> > If UFFD is off the table then it would appear there are two options:
> >
> >  - Instrument these ioctls to request pages not marked as present in the
> >    theorized KVM-owned demand paging interface
> >
> >  - Mandate that userspace has transferred all of the required VGIC / ITS
> >    pages before resuming on the target
> >
> > The former increases the maintenance burden of supporting post-copy
> > upstream and the latter *will* fail spectacularly. Ideally we use a
> > mechanism that doesn't require us to think about instrumenting
> > post-copy for every new widget that we will want to virtualize.
> >
> > > So in the short term we could provide a partial solution for
> > > HugeTLB-backed VMs (at least unblocking Google's use-case) and in the
> > > long-term there's line of sight of a unified solution.
> >
> > Who do we expect to look after the upstreamed short-term solution once
> > Google has moved on to something else?
> 
> Note, the proposed long-term solution you are replying to is an
> extension of the short-term solution, not something else.

Ack, I just feel rather strongly that the priority should be making
guest_memfd with whatever post-copy scheme we devise. Once we settle
on a UAPI that works for the new and shiny thing then it's easier to
rationalize applying the UAPI change to other memory backing types.

-- 
Thanks,
Oliver