Re: [PATCH 1/8] PCI: Add pcibios_stop_dev()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 5, 2013 at 4:36 PM, Benjamin Herrenschmidt
<benh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, 2013-07-05 at 12:49 -0600, Bjorn Helgaas wrote:
>> We already have pcibios_release_device() (just merged for v3.11).
>> Would that work for you?  I haven't seen your whole series, so I can't
>> tell whether you need something different.
>
> Maybe... Gavin's original hook applies when the device is removed
> from the tree, your new hook when the refcount expires and the
> structure is about to be freed...

Yep, pcibios_release_device() is definitely at a different point.  I
just want to avoid exposing too many points in the device lifetime to
the arch, because it makes it so much harder to refactor things.

> I *suspect* that's fine but I'll need to have a closer look (or wait
> for Gavin to be back from vacation :-)
>
> BTW. One thing we do in EEH is that if we get an error and the driver
> for the device doesn't implement recovery, we simulate a hotplug.
>
> IE. We unplug the device (currently removing it), reset it (or the
> bus it's on, whatever applies, which lifts the fence established by
> the EEH hardware), and re-plug it.
>
> The current method really removes the device from the PCI subsystem,
> and "plugs" it back. IE. We rescan the slot, and might even end up
> re-assigning resources, which I'm not too fan about. It's also unclear
> what quirks gets called in that re-plug case, etc...
>
> I'm wondering whether it might be better to keep the pci_dev around,
> just mark it blocked for user access, unbind the driver, reset, restore
> the BARs to their original values, unblock and re-bind.

Personally, I like the full hotplug approach, even though it will
probably reassign resources, because it uses the standard paths that
should be better-tested, and it's clear what should happen.
Reassigning resources should be OK because we already have to do that
for the non-error handling hot-add case.

I know we have paths where we save config space, do a reset, and
restore config space.  I don't like those because some quirks aren't
applied and I don't believe restoring config space is sufficient to
make the device safe to use.  There could be internal state that we
aren't restoring.  It's also possible that a firmware update means the
device is different than it was before the reset, e.g., BARs could be
different types or sizes.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux