Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors

Aaron Lewis <aaronlewis@xxxxxxxxxx> · Mon, 19 Apr 2021 09:47:19 -0700

> > Add a fallback mechanism to the in-kernel instruction emulator that
> > allows userspace the opportunity to process an instruction the emulator
> > was unable to.  When the in-kernel instruction emulator fails to process
> > an instruction it will either inject a #UD into the guest or exit to
> > userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
> > not know how to proceed in an appropriate manner.  This feature lets
> > userspace get involved to see if it can figure out a better path
> > forward.
>
> Given that you are intending to try and handle the instruction in
> user-space, it seems a little odd to overload the
> KVM_EXIT_INTERNAL_ERROR/KVM_INTERNAL_ERROR_EMULATION exit reason/sub
> error.
>
> Why not add a new exit reason, particularly given that the caller has to
> enable the capability to get the relevant data? (It would also remove
> the need for the flag field and any mechanism for packing multiple bits
> of detail into the structure.)

I considered that, but I opted for the extensibility of the exiting
KVM_EXIT_INTERNAL_ERROR instead.  To me it was six of one or half a
dozen of the other.  With either strategy I still wanted to provide
for future extensibility, and had a flags field in place.  That way we
can add to this in the future if we find something that is missing
(ie: potentially wanting a way to mark dirty pages, possibly passing a
fault address, etc...)

> > +/*
> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
> > + * to describe what is contained in the exit struct.  The flags are used to
> > + * describe it's contents, and the contents should be in ascending numerical
> > + * order of the flag values.  For example, if the flag
> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
> > + * length and instruction bytes would be expected to show up first because this
> > + * flag has the lowest numerical value (1) of all the other flags.
> > + */
> > +#define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0)
> > +
> >  /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
> >  struct kvm_run {
> >       /* in */
> > @@ -382,6 +393,14 @@ struct kvm_run {
> >                       __u32 ndata;
> >                       __u64 data[16];
> >               } internal;
> > +             /* KVM_EXIT_INTERNAL_ERROR, too (not 2) */
> > +             struct {
> > +                     __u32 suberror;
> > +                     __u32 ndata;
> > +                     __u64 flags;
> > +                     __u8  insn_size;
> > +                     __u8  insn_bytes[15];
> > +             } emulation_failure;
> > +/*
> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
> > + * to describe what is contained in the exit struct.  The flags are used to
> > + * describe it's contents, and the contents should be in ascending numerical
> > + * order of the flag values.  For example, if the flag
> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
> > + * length and instruction bytes would be expected to show up first because this
> > + * flag has the lowest numerical value (1) of all the other flags.
>
> When adding a new flag, do I steal bytes from insn_bytes[] for my
> associated payload? If so, how many do I have to leave?
>

The emulation_failure struct mirrors the internal struct, so if you
are just adding to what I have, you can safely add up to 16 __u64's.
I'm currently using the size equivalent to 3 of them (flags,
insn_size, insn_bytes), so there should be plenty of space left for
you to add what you need to the end.  Just add the fields you need to
the end of emulation_failure struct, increase 'ndata' to the new
count, add a new flag to 'flags' so we know its contents.