Re: vfio/dev-assignment: potential pci_block_user_cfg_access nesting

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Wed, 24 Aug 2011 11:09:13 +0200

On 2011-08-24 00:05, Alex Williamson wrote:
> On Tue, 2011-08-23 at 15:31 +0200, Jan Kiszka wrote:
>> Hi Alex,
>>
>> just ran into some corner case with my reanimated IRQ sharing patches
>> that may affect vfio as well:
>>
>> How are vfio_enable/disable_intx synchronized against all other possible
>> spots that call pci_block_user_cfg_access?
>>
>> I hit the recursion bug check in pci_block_user_cfg_access with my code
>> which takes the user_cfg lock like vfio does. It likely races with
>> pci_reset_function here - and should do so in vfio as well.
> 
> So the race is that we're doing a pci_reset_function and while we've got
> pci_block_user_cfg_access set, an interrupt comes in (maybe from a
> device sharing the interrupt line), and we hit the BUG_ON when trying to
> nest pci_block_user_cfg_access?

Most probably the scenario I was seeing, but I didn't debugged it in all
details as it already locked up my notebook twice.

> 
>> Just taking some lock would mean having to run pci_reset_function with
>> IRQs disabled to synchronize with the IRQ handler (not sure if that is
>> possible at all). Alternatively, we would have to disable the interrupt
>> line or deregister the IRQ while resetting. Or we perform INTx mask
>> manipulation in an unsynchronized fashion, resolving races with user
>> space differently (still need to think about this option).
>>
>> Any other thoughts?
> 
> I think this is a bit easier for vfio since the reset is already routed
> through a vfio ioctl.  We can just use a mutex between the two, reset
> would wait on the mutex while the interrupt handler would skip masking
> of a shared interrupt if it can't get the mutex (hopefully the interrupt
> is really for a shared device or we squelch it via the reset before we
> trigger the spurious interrupt counter).
> 
> I think the only path for kvm assignment that doesn't involve also
> rerouting the reset through a kvm ioctl would have to be avoiding the
> problem in userspace.  We'd have to unregister the interrupt handler,
> reset, then re-register.  That sounds pretty heavy, but the reset is
> already a slow process.  Thanks,

I don't think we can allow userspace to potentially crash the host.

Anyway, this problem turns out to be way more generic. Just run two
"echo 1 > /sys/bus/pci/.../reset" loops on the same device in parallel.
But be warned, you will have to reboot that box afterward.

Maybe this very creative interface of pci_block_user_cfg_access was once
OK when only the IPR SCSI driver used it. But by the time it grew beyond
that use case, it became hopelessly broken (well, open-coded
locking...). We need to redesign it, synchronizing users that can sleep
via a simple mutex and addressing access to the status/command word
separately via an IRQ-save spinlock (as we need that service in hard IRQ
handlers).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html