Re: [PATCH] KVM: arm64: Add support for IDSR exits to userspace

Lev Aronsky <aronsky@xxxxxxxxx> · Mon, 23 Mar 2020 09:41:27 +0000

On Mon, Mar 23, 2020 at 09:07:12AM +0000, Marc Zyngier wrote:
> On 2020-03-23 08:22, Lev Aronsky wrote:
> > On Sun, Mar 22, 2020 at 05:29:52PM +0000, Marc Zyngier wrote:
> > > On 2020-03-22 14:20, Lev Aronsky wrote:
> > > > On Sun, Mar 22, 2020 at 11:34:35AM +0000, Marc Zyngier wrote:
> > > > > Hi Lev,
> > > > >
> > > > > Thanks for this.
> > > > >
> > > > > On 2020-03-22 09:36, aronsky@xxxxxxxxx wrote:
> > > > > > From: Lev Aronsky <aronsky@xxxxxxxxx>
> > > > > >
> > > > > > As per ARM DDI 0487E.a, section D12.3.2, there can be various
> > > > > > system registers that are IMPLEMENTATION DEFINED.
> > > > > >
> > > > > > So far, KVM would inject an undefined instruction into the guest,
> > > > > > whenever access to an implementation defined system register (from
> > > > > > here on referred to as IDSR) was trapped. This makes sense, since a
> > > > > > general-purpose OS probably shouldn't rely on the existence of IDSRs.
> > > > > >
> > > > > > This patch introduces an option to give userspace a chance to handle
> > > > > > these traps. This can be used to emulate specific architectures, and
> > > > > > virtualize operating systems that rely on the existence of specific
> > > > > > IDSRs.
> > > > >
> > > > > Do you have an example of such operating systems? Also, do you have
> > > > > an example where userspace could actually do something useful about
> > > > > such access, other than maybe treating it as RAZ/WI?
> > > >
> > > > I ran into the issue when working on our company's project, which aims
> > > > to emulate Apple's iOS under QEMU. While emulation currently works
> > > > nicely, we were looking to improve performance by using hardware
> > > > virtualization, and that's when we ran into the issue of
> > > > implementation-defined system registers, since iOS uses those. Frankly,
> > > > in our case, we don't really know the purpose of those registers, as far
> > > > as the iOS kernel is concerned. When emulating them, we treat them as
> > > > simple 64-bit storage areas. It's possible that treating them as RAZ/WI
> > > > would work, as well.
> > > 
> > > It's not really reassuring, is it? :-/
> > 
> > It's not perfect, but also not too bad - the system is booting as
> > expected. It might not be a 100% perfect emulation of a real device
> > without a proper implementation of those registers, but it's good enough
> > in our case.
> 
> Hum. OK, I guess.
> 
> > > > > > Similarly to the recently introduced NISV exits, this is an ABI to
> > > > > > userspace, and comes with a matching new capability that allows the
> > > > > > configuration of the behavior.
> > > > >
> > > > > These are different issues: one is a shortcoming of the architecture,
> > > > > the other a shortcoming (or hyperspecialization) of the guest OS.
> > > >
> > > > You're correct. I mentioned the NISV exits because my code is modeled
> > > > after them, and was hoping it would make following my changes easier.
> > > 
> > > It does. IT is just that wou did seem to conflate the two things,
> > > which
> > > triggered an allergic reaction... ;-)
> > > 
> > > > > > This patch introduces KVM_CAP_ARM_IDSR_TO_USER, which can be used to
> > > > > > enable userspace exits due to access to IDSRs. Additionally, it
> > > > > > introduces a matching new exit reason, KVM_EXIT_ARM_IDSR, to report
> > > > > > those events to userspace. Userspace can choose to emulate the access
> > > > > > based on the architecture requirements, or refuse to emulate it and
> > > > >
> > > > > s/architecture/implementation/
> > > >
> > > > Sorry for the newbie question - but how do I change the description now?
> > > > I will upload an updated patch, based on your following comments, but
> > > > I'm not sure how to change the original commit description the correct
> > > > way.
> > > 
> > > git commit --amend
> > 
> > Thanks. I know about amending commits - I was wondering about what to do
> > with the amended commit. As far as I understand, given there are
> > additional changes I'll have to make, I'd have to make a v2 of the
> > patch, but according to the guides I've seen, I would only add the
> > changelog of the newer version of the patch, so I wasn't sure whether to
> > make the correction in the original description of the patch or not.
> 
> In general, the commit log must be accurate, so you are free to change
> it at any time to reflect what the patch does.
> 
> > 
> > > >
> > > > > > let the kernel continue with the default behavior (injecting an
> > > > > > undefined instruction exception into the guest).
> > > > > >
> > > > > > Signed-off-by: Lev Aronsky <aronsky@xxxxxxxxx>
> > > > > > ---
> > > > > >  arch/arm64/include/asm/kvm_coproc.h |  1 +
> > > > > >  arch/arm64/include/asm/kvm_host.h   |  6 +++
> > > > > >  arch/arm64/kvm/sys_regs.c           | 66 ++++++++++++++++++++++++++++-
> > > > > >  include/uapi/linux/kvm.h            | 14 ++++++
> > > > > >  tools/include/uapi/linux/kvm.h      | 14 ++++++
> > > > > >  virt/kvm/arm/arm.c                  | 11 +++++
> > > > > >  6 files changed, 111 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/arch/arm64/include/asm/kvm_coproc.h
> > > > > > b/arch/arm64/include/asm/kvm_coproc.h
> > > > > > index 0185ee8b8b5e..34b45efffe52 100644
> > > > > > --- a/arch/arm64/include/asm/kvm_coproc.h
> > > > > > +++ b/arch/arm64/include/asm/kvm_coproc.h
> > > > > > @@ -33,6 +33,7 @@ int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct
> > > > > > kvm_run *run);
> > > > > >  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
> > > > > >  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
> > > > > >  int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run);
> > > > > > +int kvm_handle_idsr_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
> > > > > >
> > > > > >  #define kvm_coproc_table_init kvm_sys_reg_table_init
> > > > > >  void kvm_sys_reg_table_init(void);
> > > > > > diff --git a/arch/arm64/include/asm/kvm_host.h
> > > > > > b/arch/arm64/include/asm/kvm_host.h
> > > > > > index d87aa609d2b6..951c7e6fec8b 100644
> > > > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > > > @@ -91,6 +91,12 @@ struct kvm_arch {
> > > > > >  	 * supported.
> > > > > >  	 */
> > > > > >  	bool return_nisv_io_abort_to_user;
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * If we encounter an access to an implementation-defined system
> > > > > > +	 * register, report this to user space.
> > > > > > +	 */
> > > > > > +	bool return_idsr_to_user;
> > > > >
> > > > > If we are going to add flags like this, maybe we'd be better off
> > > > > having
> > > > > actual flags as an unsigned long.
> > > >
> > > > I don't mind either way - as I mentioned, I tried to follow the NISV
> > > > exits style, and therefore used `bool`. Should I keep it, change it to
> > > > `unsigned long`, or change both mine and the NISV flag to `unsigned
> > > > long`?
> > > 
> > > Change return_nisv_io_abort_to_user to "unsigned long flags", add a
> > > new
> > > RETURN_NISV_IO_ABORT_TO_USER flag and repaint all the current users
> > > in a preliminary patch.
> > 
> > Will do. This means there will be two patchsets, right?
> 
> One patch-set, at least two patches (and a cover letter).

Got it. Still learning the lingo :-)

> 
> [...]
> 
> > > > I must admit that I haven't taken an in-depth look at 32-bit accesses,
> > > > since it was out of scope of our project (which currently focuses only
> > > > on newish versions of iOS). IMO, the patch is useful as is, even without
> > > > the 32-bit support - but I can see how it looks like a job half-done.
> > > 
> > > The 32bit guest support definitely isn't optional. I appreciate that
> > > you
> > > focus on 64bit, but having different behaviours between the two is not
> > > really something I want to entertain.
> > 
> > Totally understandable - but I'm afraid that in this case, the patch
> > will have to wait, as we currently focus our R&D resources on further
> > advancing our project. I will do my best, however, to try and find some
> > time to work on it in the near future. Would booting a 32-bit ARM Linux
> > and using a kernel module to write/read implementation-defined registers
> > be an acceptable approach to test it? Also, just to make sure I
> > understand correctly - we're talking about AArch32 mode of arm64
> > processors, right? Not testing it on 32-bit ARM cores (which, as I
> > understand, are being retired in 5.7 anyway).
> 
> 32bit-on-64bit is what I care about, yes. 32bit *host* is on its way to
> retirement, as you found out. And yes, just booting any 32bit guest and
> checking that you indeed trap access to an IMPDEF register would be enough.
> 
> > 
> > > > Unfortunately, we don't have a 32-bit setup that would allow us to
> > > > properly test the code before submitting. I could copy the changes over,
> > > > but submitting the patches without actually testing them seems
> > > > irresponsible to me.
> > > 
> > > Well, you seem to at least have QEMU. This should be enough to test
> > > *something*. Also, if you're testing this with KVM, what are you
> > > running
> > > KVM on?
> > 
> > We're running it on an ARM cloud server (we were hoping to be able to
> > use SBCs for the project, but iOS uses 16K pages for kernel mode, and we
> > found out (the hard way) that most older/cheaper ARM cores don't support
> > it (Cortex A76 being the first one to support it, IIRC).
> 
> I think there is more than just A76. A55 definitely has TGran16, as well as
> A73, A75, A65 (I've stopped looking in the various TRMs). So definitely
> in the realm of SBCs (I have a quad A55 on my desk, worth $70).

You're right, as usual. But A55 SBCs are apparently hard to find - most
of the SBCs I've seen are A53/A72 (which I thought would be enough,
until we found out about the TGran16 problem), and now that I looked for
A55-based SBCs, I couldn't find one with a big enough memory (we're
looking at 4GB+, so that we can provide the VM at least 2GB and still
have adequate performance).

> 
> > > 
> > > > Interestingly, EL0 access to implementation-defined registers currently
> > > > results in an UNDEF, even though I expected it to be passed on to our
> > > > handler (I saw this behavior with a custom system register we defined
> > > > for direct communication with the hypervisor from a user-mode program we
> > > > developed). I tried following the ARM documentation to figure out what
> > > > could cause such a behavior, but so far I'm at a loss.
> > > 
> > > Here's your answer:
> > > 
> > > "When the value of HCR_EL2.TIDCP is 1, it is IMPLEMENTATION DEFINED
> > > whether
> > > any of this functionality accessed from EL0 is trapped to EL2. If it
> > > is not,
> > > then it is UNDEFINED, and any attempt to access it from EL0
> > > generates an
> > > exception that is taken to EL1."
> > > 
> > > Also, I don't really understand how you define a custom system
> > > register.
> > > Unless you're writing the HW as well, of course.
> > 
> > We are using QEMU as the hypervisor. QEMU allows for definition of
> > arbitrary system registers (based on opc0/opc1/opc2/crm/crn), with
> > custom read/write callback functions. We have a custom machine for
> > iPhone emulation (you can take a look at our code at
> > https://github.com/alephsecurity/xnu-qemu-arm64, if you're interested),
> > so yeah - you could say we're writing the hardware, as well.
> 
> I'm pretty sure this wouldn't work with HW virtualization. I suspect
> this would UNDEF directly on the CPU, leading to an exception being taken
> at EL1 without intervention of the hypervisor. Which makes sense as you'd
> be executing an instruction that the CPU really doesn't implement.

Yes, that seems to be what's happening. We'll have to think of a
different mechanism for trapping access from user-mode straight to the
hypervisor - or, alternatively, move our custom code into the kernel. I
know it's a bit off-topic, but thank you for your advice!

> 
> > > > As far as the whole idea of supporting non-architectural guests... First
> > > > of all, you're absolutely spot-on regarding your guess - as mentioned
> > > > above, we're attempting to virtualize iOS. I believe that in the current
> > > > state of the patch, given its opt-in nature and limited scope
> > > > (implementation-defined registers), its benefits (the potential to
> > > > successfully execute non-architetural guests) outweigh the negatives
> > > > (extra code in the kernel, that's unnecessary to most of the guests
> > > > under ARM/KVM). I would be happy to give more examples (other than iOS)
> > > > that use these registers, but unfortunately, I'm not aware of any such
> > > > systems (though I wouldn't be surprised if Samsung has some
> > > > Exynos-specifc code, but that's just a shot in the dark).
> > > 
> > > Oh, absolutely *all* implementations have IMPDEF registers. You just
> > > have to look at the TRM for any of the ARM implementations, which
> > > document a number of them.
> > > 
> > > And as for having non-architectural code in the kernel, that's of
> > > course,
> > > as you've guessed, a total no-no-no. We made that mistake on 32bit,
> > > and
> > > stopped quickly!
> > > 
> > > My worry is more that you'd need to start faking a lot more than
> > > just a
> > > couple of IMPDEP sysregs. All the ID registers, for a start. Then find
> > > ways to deal with the host's errata, that the guest has to know about
> > > (despite the ID registers being faked). And make sure that no
> > > workaround
> > > that happen in the guest result in anything unexpected.
> > > 
> > > Some of that is possible (the ID register bit, although it requires
> > > some
> > > very serious work), and you could cross your fingers hoping that no
> > > erratum
> > > will get in the way. But it seems pretty fragile...
> > 
> > That's definitely something we're worried about, but so far our efforts
> > have been successful (as in, right now, with the above changes in the
> > kernel and matching updates to QEMU, our iOS image boots to user-mode
> > successfully). We are definitely skipping on some parts of the OS (for
> > example, anything related to TPM). It's possible that as we attempt to
> > improve the accuracy of our emulation/virtualization, we'll run into the
> > problems you're describing.
> 
> Fun! ;-)
> 
> I'm looking forward to your posting of the next version that would include
> 32bit supoport. Another thing I'd ask is a corresponding support patch for
> a userspace VMM of your choice (QEMU, kvmtool) so that I can at least
> exercise
> the support.

Of course! We already have a patched QEMU that works. I'll have to tidy
up the code a bit, extract it out of our fork, and apply it to the
mainline QEMU before submitting. Will take care of it once I deal with
the 32-bit support.

> 
> Thanks,
> 
>         M.
> -- 
> Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm