On Thu, Oct 05, 2017 at 01:42:09PM -0500, Bjorn Helgaas wrote: >On Thu, Oct 05, 2017 at 11:05:12PM +0800, Wei Yang wrote: >> On Wed, Oct 4, 2017 at 5:15 AM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: >> > [+cc Alex, Gavin, Wei] >> > >> > On Fri, Sep 29, 2017 at 10:49:38PM -0700, Govindarajulu Varadarajan wrote: >> >> CPU0 CPU1 >> >> --------------------------------------------------------------------- >> >> __driver_attach() >> >> device_lock(&dev->mutex) <--- device mutex lock here >> >> driver_probe_device() >> >> pci_enable_sriov() >> >> pci_iov_add_virtfn() >> >> pci_device_add() >> >> aer_isr() <--- pci aer error >> >> do_recovery() >> >> broadcast_error_message() >> >> pci_walk_bus() >> >> down_read(&pci_bus_sem) <--- rd sem >> >> down_write(&pci_bus_sem) <-- stuck on wr sem >> >> report_error_detected() >> >> device_lock(&dev->mutex)<--- DEAD LOCK >> >> >> >> This can also happen when aer error occurs while pci_dev->sriov_config() is >> >> called. >> >> >> >> This patch does a pci_bus_walk and adds all the devices to a list. After >> >> unlocking (up_read) &pci_bus_sem, we go through the list and call >> >> err_handler of the devices with devic_lock() held. This way, we dont try >> >> to hold both locks at same time. >> > >> > I feel like we're working too hard to come up with an ad hoc solution >> > for this lock ordering problem: the __driver_attach() path acquires >> > the device lock, then the pci_bus_sem; the AER path acquires >> > pci_bus_sem, then the device lock. >> > >> > To me, the pci_bus_sem, then device lock order seems natural. The >> > pci_bus_sem protects all the bus device lists, so it makes sense to >> > hold it while iterating over those lists. And if we're operating on >> > one of those devices while we're iterating, it makes sense to acquire >> > the device lock. >> > >> > The pci_enable_sriov() path is the one that feels strange to me. >> > We're in a driver probe method, and, surprise!, brand-new devices show >> > up and we basically ask the PCI core to enumerate them synchronously >> > while still in the probe method. >> > >> > Is there some reason this enumeration has to be done synchronously? >> > I wonder if we can get that piece out of the driver probe path, e.g., >> > by queuing up the pci_iov_add_virtfn() part to be done later, in a >> > path where we're not holding a device lock? >> > >> >> Hi, Bjorn, >> >> First let me catch up with the thread. >> >> We have two locking sequence: >> 1. pci_bus_sem -> device lock, which is natural >> 2. device lock -> pci_bus_sem, which is not > >Right. Or at least, that's my assertion :) I could be convinced >otherwise. > >> pci_enable_sriov() sits in class #2 and your suggestion is to move the >> pci_iov_add_virtfn() to some queue which will avoid case #2. >> >> If we want to implement your suggestion, one thing unclear to me is >> how would we handle the error path? Add a notification for the >> failure? This would be easy for the core kernel, while some big change >> for those drivers. > >My suggestion was for discussion. It's entirely possible it will turn >out not to be feasible. > >We're only talking about errors from pci_iov_add_virtfn() here. We >can still return all the other existing errors from sriov_enable(), >which the driver can see. These errors seem more directly related to >the PF itself. > >The pci_iov_add_virtfn() errors are enumeration-type errors (failure >to add a bus, failure to read config space of a VF, etc.) These >feel more like PCI core issues to me. The driver isn't going to be >able to do anything about them. > Ideally, PF and VF has their own probe function and they don't interfere each other. From this point of view, I agree these failures are not handled by drivers. While in the real implementation, I am not 100% for sure the PF driver operates without the knowledge of enabled VFs. >The end result would likely be that a VF is enabled in the hardware >but not added as a PCI device. The same errors can occur during >boot-time or hotplug-time enumeration of non-SR-IOV devices. > >Are these sort of errors important to the PF driver? If the PF driver >can get along without them, maybe we can use the same strategy as when >we enumerate all other devices, i.e., log something in dmesg and >continue on without the device. > Besides the functionality, I have another concern on the behavior change. Current behavior is the VFs will be enabled ALL or NONE, which we will add a third condition PARTIAL. For example, the sys admin wants to enable 5 VFs while leads to 3 enabled at last. Hmm, not a big deal, while need to inform the users. >Bjorn -- Wei Yang Help you, Help me