On Tue, 6 Nov 2018 11:22:46 +0100 Christian Ehrhardt <christian.ehrhardt@xxxxxxxxxxxxx> wrote: > [...] > > > > > One more try, lightly tested, hopefully the best of both worlds with a > > mutex per bus/slot so that multiple buses/slots can be reset in > > parallel, but the devices affected by those resets, per bus/slot, are > > serialized. Now my multi-function device gets a bus reset with the > > parallel unit tests. Thanks, > > > > Alex > > > > commit dc05ffc2f5a666a68b547f326e4c6061abf9f89a > > Author: Alex Williamson <alex.williamson@xxxxxxxxxx> > > Date: Wed Oct 31 13:22:33 2018 -0600 > > > > vfio/pci: Parallelize device open and release > > Hi Alex, > thanks for your thoughts and patches on this. > I'm glad you were able to come up avoiding the regression for > functions in the same group. > I built and tested your second patch and it works fine with my initial case. > > As with my RFC the threaded and concurrent process based close actions > are now fast - giving userspace a chance to make it faster. > Kill signal/Process exit still is serialized by the __fput cleanup > from the work queue - as expected. > Your patch is even slightly faster (about half a second) than my > initial suggestion. I'd guess that's because your patch only released the global lock around reset, so large parts of the device release were still serialized while now they run entirely in parallel. > Further for some more light tests I ran some general vfio/passthrough > tests which all still worked as they did before. > > Tested-by: Christian Ehrhardt <christian.ehrhardt@xxxxxxxxxxxxx> Thanks, I'll do more testing myself an formally post it with your T-b. > You also added some debug code to track the retval of vfio_pci_try_bus_reset. > Therefore I thought it might be worth to mention that in my case this > reports for every vfio detach: > [ 327.544291] vfio_pci_try_bus_reset: -22 > That would be EINVAL right? > I didn't track down where that came from, please let me know if that > would be important to you. I think I understand it, your device is reset via pci_try_reset_function() because it's a singleton device on the bus (your patch only parallelized across this call) while vfio_pci_try_bus_reset() is looking for cases where releasing a device allows a bus reset to occur when there are other "dirty" devices on the bus. You have no such devices on the bus, so we exit the function with the initial errno value. It's not an error in your case, but for my test case with multiple devices on the bus, it allows me to verify the per-bus serialization and that the last device through does trigger a bus reset. Thanks, Alex