VFs and the rescan/remove lock

"Oliver O'Halloran" <oohall@xxxxxxxxx> · Fri, 14 Aug 2020 15:22:30 +1000

Hello PCI friends,

I've been looking at cleaning up the powerpc PCI error recovery (EEH) paths and
noticed some interesting locking in eeh_rmv_device(). Pruning the surrounding
EEH gunk leaves us with:

if (edev->physfn) {
	pci_iov_remove_virtfn(edev->physfn, edev->vf_index);
} else {
	pci_lock_rescan_remove();
	pci_stop_and_remove_bus_device(edev->pdev);
	pci_unlock_rescan_remove();
}

So for normal devices we're doing the right thing and taking the rescan
lock and for VFs we aren't. pci_iov_remove_virtfn() uses
pci_stop_and_remove_bus_device() internally so I figure it should have the
same locking requirements.

This was pointed out about two years ago while discussing a patch[1] from
Keith to fix the same problem in sriov_numvfs_store(), which also
doesn't take the rescan lock. That case and EEH case above are simple
enough to fix since they can be wrapped in an lock / unlock pair. What is
less easy to fix are when VFs are manipulated in the a driver's .probe()
and .remove() methods.

When a device is first scanned we try to attach a driver in
pci_bus_add_device(). In that context the driver's .probe() will be
called with the rescan lock held. If we attach a driver at a later point
(sysfs bind, module load, or deferred probe) the .probe() function is
called without the rescan lock being held. Admittedly, enabling VFs in
.probe() is pretty unconventional and Bjorn pointed out in [1] there's
only a handful of drivers which do this. Unfortunately, the removal
path has the same issue.

If a device is removed using pci_stop_and_remove_bus_device() (sysfs
remove, hot-unplug, EEH recovery) then .remove() is called with the lock
held. If a driver is unbound "normally" via sysfs then .remove() is called
without the lock held. Unlike the probe case tearing down VFs in .remove()
seems to be pretty universal among PF drivers. IMO it's the correct thing
for a PF driver to be doing too so I'm not sure what to do about this.

The Linux mutex API doesn't permit recursive locking or provide a way to
check if the executing thread owns the lock so we can't just add a hack
the to sriov_enable() / sriov_disable() to take the lock if needed.
Recursion and ownership checking for mutexes is implemented in the kernel,
but it's not part of the public API. I figure we probably shouldn't try
abuse those features :) The best we have is a way to test if the mutex is
locked by someone (possibly not us) and that's of limited use here.

IMO we probably want to change how attaching/detaching drivers works so
we can guarantee that the rescan lock won't be held when the driver
functions are called. That would allow us to have sriov_enable() /
sriov_disable() take the rescan lock when they need it. That will
probably cause some churn since we'd need to ensure drivers are
detached seperately to removing their devices, but I can't think of
anything better. I'm open to other suggestions though.

Oliver

[1] https://patchwork.ozlabs.org/project/linux-pci/patch/20180809163356.18650-1-keith.busch@xxxxxxxxx/