On Wed, Jun 14, 2023 at 10:35 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu, > > Tagging a globally visible, non-static function as "inline" is odd, to say the > least. I think my eyes glaze over whenever I read the words "translation unit" (my brain certainly does) so I'll have to take your word for it. IIRC last time I tried to mark this function "static" the compiler yelled at me, so removing the "inline" it is. > > +inline void kvm_populate_efault_info(struct kvm_vcpu *vcpu, > > I strongly prefer to avoid "populate" and "efault". Avoid "populate" because > that verb will become stale the instance we do anything else in the helper. > Avoid "efault" because it's imprecise, i.e. this isn't to be used for just any > old -EFAULT scenario. Something like kvm_handle_guest_uaccess_fault()? Definitely > open to other names (especially less verbose names). I've taken the kvm_handle_guest_uaccess_fault() name for now, though I remember you saying something about "uaccess" names being bad because they'll become inaccurate once GPM rolls around? I'll circle back on the names before sending v5 out. > > (WARN_ON_ONCE(vcpu->run->exit_reason != KVM_EXIT_UNKNOWN)) > > As I've stated multiple times, this can't WARN in "normal" builds because userspace > can modify kvm_run fields at will. I do want a WARN as it will allow fuzzers to > find bugs for us, but it needs to be guarded with a Kconfig (or maybe a module > param). One idea would be to make the proposed CONFIG_KVM_PROVE_MMU[*] a generic > Kconfig and use that. For now I've added a KVM_WARN_MEMORY_FAULT_ANNOTATE_ALREADY_POPULATED kconfig: open to suggestions on the name. > I got a bit (ok, way more than a bit) lost in all of the (A) (B) (C) madness. I > think this what you intended for each case? > > (A) if there are any existing paths in KVM_RUN which cause a vCPU > to (1) populate the kvm_run struct then (2) fail a vCPU guest memory > access but ignore the failure and then (3) complete the exit to > userspace set up in (1), then the contents of the kvm_run struct written > in (1) will be corrupted. > > (B) if KVM_RUN fails a guest memory access for which the EFAULT is annotated > but does not return the EFAULT to userspace, then later returns an *un*annotated > EFAULT to userspace, then userspace will receive incorrect information. > > (C) an annotated EFAULT which is ignored/suppressed followed by one which is > propagated to userspace. Here the exit-reason-unset check will prevent the > second annotation from being written, so userspace sees an annotation with > bad contents, If we believe that case (A) is a weird sequence of events > that shouldn't be happening in the first place, then case (C) seems more > important to ensure correctness in. But I don't know anything about how often > (A) happens in KVM, which is why I want others' opinions. Yeah, I got lost in the weeds: you've gotten the important points though > (A) does sadly happen. I wouldn't call it a "pattern" though, it's an unfortunate > side effect of deficiencies in KVM's uAPI. > > (B) is the trickiest to defend against in the kernel, but as I mentioned in earlier > versions of this series, userspace needs to guard against a vCPU getting stuck in > an infinite fault anyways, so I'm not _that_ concerned with figuring out a way to > address this in the kernel. KVM's documentation should strongly encourage userspace > to take action if KVM repeatedly exits with the same info over and over, but beyond > that I think anything else is nice to have, not mandatory. > > (C) should simply not be possible. (A) is very much a "this shouldn't happen, > but it does" case. KVM provides no meaningful guarantees if (A) does happen, so > yes, prioritizing correctness for (C) is far more important. > > That said, prioritizing (C) doesn't mean we can't also do our best to play nice > with (A). None of the existing exits use anywhere near the exit info union's 256 > bytes, i.e. there is tons of space to play with. So rather than put memory_fault > in with all the others, what if we split the union in two, and place memory_fault > in the high half (doesn't have to literally be half, but you get the idea). It'd > kinda be similar to x86's contributory vs. benign faults; exits that can't be > "nested" or "speculative" go in the low half, and things like memory_fault go in > the high half. > > That way, if (A) does occur, the original information will be preserved when KVM > fills memory_fault. And my suggestion to WARN-and-continue limits the problematic > scenarios to just fields in the second union, i.e. just memory_fault for now. > At the very least, not clobbering would likely make it easier for us to debug when > things go sideways. > > And rather than use kvm_run.exit_reason as the canary, we should carve out a > kernel-only, i.e. non-ABI, field from the union. That would allow setting the > canary in common KVM code, which can't be done for kvm_run.exit_reason because > some architectures, e.g. s390 (and x86 IIRC), consume the exit_reason early on > in KVM_RUN. I think this is a good idea :D I was going to suggest something similar a while back, but I thought it would be off the table- whoops. My one concern is that if/when other features eventually also use the "speculative" portion, then they're going to run into the same issues as we're trying to avoid here. But fixing *that* (probably by propagating these exits through return values/the call stack) would be a really big refactor, and C doesn't really have the type system for it in the first place :(