On Wed, Oct 09, 2024 at 11:27:52PM +0000, Oliver Upton wrote: > On Wed, Oct 09, 2024 at 12:36:32PM -0700, Sean Christopherson wrote: > > On Wed, Oct 09, 2024, Oliver Upton wrote: > > > On Wed, Oct 09, 2024 at 07:36:03PM +0100, Marc Zyngier wrote: > > > > As there is very little ordering in the KVM API, userspace can > > > > instanciate a half-baked GIC (missing its memory map, for example) > > > > at almost any time. > > > > > > > > This means that, with the right timing, a thread running vcpu-0 > > > > can enter the kernel without a GIC configured and get a GIC created > > > > behind its back by another thread. Amusingly, it will pick up > > > > that GIC and start messing with the data structures without the > > > > GIC having been fully initialised. > > > > > > Huh, I'm definitely missing something. Could you remind me where we open > > > up this race between KVM_RUN && kvm_vgic_create()? > > Ah, duh, I see it now. kvm_arch_vcpu_run_pid_change() doesn't serialize > on a VM lock, and kvm_vgic_map_resources() has an early return for > vgic_ready() letting it blow straight past the config_lock. > > Then if we can't register the MMIO region for the distributor > everything comes crashing down and a vCPU has made it into the KVM_RUN > loop w/ the VGIC-shaped rug pulled out from under it. There's definitely > another functional bug here where a vCPU's attempts to poke the a theoretical bug, that is. In practice the window to race against likely isn't big enough to get the in-guest vCPU to the point of poking the halfway-initialized distributor. > distributor wind up reaching userspace as MMIO exits. But we can worry > about that another day. > > If memory serves, kvm_vgic_map_resources() used to do all of this behind > the config_lock to cure the race, but that wound up inverting lock > ordering on srcu. > > Note to self: Impose strict ordering on GIC initialization v. vCPU > creation if/when we get a new flavor of irqchip. > > > > I'd thought the fact that the latter takes all the vCPU mutexes and > > > checks if any vCPU in the VM has run would be enough to guard against > > > such a race, but clearly not... > > > > Any chance that fixing bugs where vCPU0 can be accessed (and run!) before its > > fully online help? > > That's an equally gross bug, but kvm_vgic_create() should still be safe > w.r.t. vCPU creation since both hold the kvm->lock in the right spot. > That is, since kvm_vgic_create() is called under the lock any vCPUs > visible to userspace should exist in the vCPU xarray. > > The crappy assumption here is kvm_arch_vcpu_run_pid_change() and its > callees are allowed to destroy VM-scoped structures in error handling. > > > E.g. if that closes the vCPU0 hole, maybe the vCPU1 case can > > be handled a bit more gracefully? > > I think this is about as graceful as we can be. The sorts of screw-ups > that precipitate this error handling may involve stupidity across > several KVM ioctls, meaning it is highly unlikely to be attributable / > recoverable. > > -- > Thanks, > Oliver -- Thanks, Oliver