On Wed, Oct 09, 2024 at 12:36:32PM -0700, Sean Christopherson wrote: > On Wed, Oct 09, 2024, Oliver Upton wrote: > > On Wed, Oct 09, 2024 at 07:36:03PM +0100, Marc Zyngier wrote: > > > As there is very little ordering in the KVM API, userspace can > > > instanciate a half-baked GIC (missing its memory map, for example) > > > at almost any time. > > > > > > This means that, with the right timing, a thread running vcpu-0 > > > can enter the kernel without a GIC configured and get a GIC created > > > behind its back by another thread. Amusingly, it will pick up > > > that GIC and start messing with the data structures without the > > > GIC having been fully initialised. > > > > Huh, I'm definitely missing something. Could you remind me where we open > > up this race between KVM_RUN && kvm_vgic_create()? Ah, duh, I see it now. kvm_arch_vcpu_run_pid_change() doesn't serialize on a VM lock, and kvm_vgic_map_resources() has an early return for vgic_ready() letting it blow straight past the config_lock. Then if we can't register the MMIO region for the distributor everything comes crashing down and a vCPU has made it into the KVM_RUN loop w/ the VGIC-shaped rug pulled out from under it. There's definitely another functional bug here where a vCPU's attempts to poke the distributor wind up reaching userspace as MMIO exits. But we can worry about that another day. If memory serves, kvm_vgic_map_resources() used to do all of this behind the config_lock to cure the race, but that wound up inverting lock ordering on srcu. Note to self: Impose strict ordering on GIC initialization v. vCPU creation if/when we get a new flavor of irqchip. > > I'd thought the fact that the latter takes all the vCPU mutexes and > > checks if any vCPU in the VM has run would be enough to guard against > > such a race, but clearly not... > > Any chance that fixing bugs where vCPU0 can be accessed (and run!) before its > fully online help? That's an equally gross bug, but kvm_vgic_create() should still be safe w.r.t. vCPU creation since both hold the kvm->lock in the right spot. That is, since kvm_vgic_create() is called under the lock any vCPUs visible to userspace should exist in the vCPU xarray. The crappy assumption here is kvm_arch_vcpu_run_pid_change() and its callees are allowed to destroy VM-scoped structures in error handling. > E.g. if that closes the vCPU0 hole, maybe the vCPU1 case can > be handled a bit more gracefully? I think this is about as graceful as we can be. The sorts of screw-ups that precipitate this error handling may involve stupidity across several KVM ioctls, meaning it is highly unlikely to be attributable / recoverable. -- Thanks, Oliver