Re: Hard and silent lock up since linux 3.14 with PCIe pass through (vfio)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2014-10-23 at 18:00 +0200, Andreas Hartmann wrote:
> Alex Williamson wrote:
> > On Wed, 2014-10-22 at 18:22 +0200, Andreas Hartmann wrote:
> >> Alex Williamson wrote:
> >>> --- a/drivers/pci/pci.c
> >>> +++ b/drivers/pci/pci.c
> >>> @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
> >>>         if (rc != -ENOTTY)
> >>>                 goto done;
> >>>  
> >>> -       rc = pci_pm_reset(dev, probe);
> >>> +       rc = pci_dev_reset_slot_function(dev, probe);
> >>>         if (rc != -ENOTTY)
> >>>                 goto done;
> >>>  
> >>> -       rc = pci_dev_reset_slot_function(dev, probe);
> >>> +       rc = pci_parent_bus_reset(dev, probe);
> >>>         if (rc != -ENOTTY)
> >>>                 goto done;
> >>>  
> >>> -       rc = pci_parent_bus_reset(dev, probe);
> >>> +       rc = pci_pm_reset(dev, probe);
> >>>  done:
> >>>         return rc;
> >>>  }
> >>
> >> This way it's crashing with echo 1 > reset, too.
> > 
> > Ok, so it's somehow related to doing a bus reset with virtual channel
> > save/restore while PM reset with VC save/restore works ok as apparently
> > does bus reset without VC save/restore.  Let's try to do a manual bus
> > reset so we can look at the post reset state of the device before the
> > kernel tries to restore it.
> > 
> > First bind the target device 03:00.0 to pci-stub or vfio-pci so that we
> > know it's not being used.
> > 
> > Next capture lspci -xxxx -s 3:00.0 so we have the starting state.
> > 
> > Then we'll do a bus reset using setpci:
> > # setpci -s 00:05.0 3e.w=40:40
> > <if you script this, wait at least 2ms here>
> > # setpci -s 00:05.0 3e.w=00:40
> > <wait 1 second here>
> > 
> > Now re-capture lspci -xxxx -s 3:00.0
> 
> The machine is booted w/ vfio bound to 3:00.0 as usual (now for testing
> linux 3.14)
> 
> lspci -xxxx -s 3:00.0
> setpci -s 00:05.0 3e.w=40:40
> usleep 10
> setpci -s 00:05.0 3e.w=00:40
> sleep 1
> lspci -xxxx -s 3:00.0
> 
> I didn't get the second lspci because the machine already was hanging.
> The first output is attached completely.

Hmm, that doesn't make much sense.  You had found that if you disabled
the VC save/restore then QEMU works.  That should have still been using
secondary bus reset as we're trying to do here, so I don't understand
why we can't do a manual secondary bus reset now.

If you use Bjorn's previous patch to disable VC save/restore and my
patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
entry for the device also still cause a hang?

Can you provide a link to the specific model for this card?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux