Re: vfio/dev-assignment: potential pci_block_user_cfg_access nesting

Alex Williamson <alex.williamson@xxxxxxxxxx> · Wed, 24 Aug 2011 09:10:47 -0600

On Wed, 2011-08-24 at 11:09 +0200, Jan Kiszka wrote:
> On 2011-08-24 00:05, Alex Williamson wrote:
> > On Tue, 2011-08-23 at 15:31 +0200, Jan Kiszka wrote:
> >> Hi Alex,
> >>
> >> just ran into some corner case with my reanimated IRQ sharing patches
> >> that may affect vfio as well:
> >>
> >> How are vfio_enable/disable_intx synchronized against all other possible
> >> spots that call pci_block_user_cfg_access?
> >>
> >> I hit the recursion bug check in pci_block_user_cfg_access with my code
> >> which takes the user_cfg lock like vfio does. It likely races with
> >> pci_reset_function here - and should do so in vfio as well.
> > 
> > So the race is that we're doing a pci_reset_function and while we've got
> > pci_block_user_cfg_access set, an interrupt comes in (maybe from a
> > device sharing the interrupt line), and we hit the BUG_ON when trying to
> > nest pci_block_user_cfg_access?
> 
> Most probably the scenario I was seeing, but I didn't debugged it in all
> details as it already locked up my notebook twice.
> 
> > 
> >> Just taking some lock would mean having to run pci_reset_function with
> >> IRQs disabled to synchronize with the IRQ handler (not sure if that is
> >> possible at all). Alternatively, we would have to disable the interrupt
> >> line or deregister the IRQ while resetting. Or we perform INTx mask
> >> manipulation in an unsynchronized fashion, resolving races with user
> >> space differently (still need to think about this option).
> >>
> >> Any other thoughts?
> > 
> > I think this is a bit easier for vfio since the reset is already routed
> > through a vfio ioctl.  We can just use a mutex between the two, reset
> > would wait on the mutex while the interrupt handler would skip masking
> > of a shared interrupt if it can't get the mutex (hopefully the interrupt
> > is really for a shared device or we squelch it via the reset before we
> > trigger the spurious interrupt counter).
> > 
> > I think the only path for kvm assignment that doesn't involve also
> > rerouting the reset through a kvm ioctl would have to be avoiding the
> > problem in userspace.  We'd have to unregister the interrupt handler,
> > reset, then re-register.  That sounds pretty heavy, but the reset is
> > already a slow process.  Thanks,
> 
> I don't think we can allow userspace to potentially crash the host.
> 
> Anyway, this problem turns out to be way more generic. Just run two
> "echo 1 > /sys/bus/pci/.../reset" loops on the same device in parallel.
> But be warned, you will have to reboot that box afterward.
> 
> Maybe this very creative interface of pci_block_user_cfg_access was once
> OK when only the IPR SCSI driver used it. But by the time it grew beyond
> that use case, it became hopelessly broken (well, open-coded
> locking...). We need to redesign it, synchronizing users that can sleep
> via a simple mutex and addressing access to the status/command word
> separately via an IRQ-save spinlock (as we need that service in hard IRQ
> handlers).

Yep, that sounds like the best path.  pci_block_user_cfg_access is at
best "fragile" in it's current implementation.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html