Re: [PATCH net-next 6/6] mlxsw: pci: Add support for new reset flow

Ido Schimmel <idosch@xxxxxxxxxx> · Thu, 13 Apr 2023 11:10:54 +0300

On Thu, Mar 30, 2023 at 12:49:58PM -0600, Alex Williamson wrote:
> On Thu, 30 Mar 2023 11:26:30 +0300
> Ido Schimmel <idosch@xxxxxxxxxx> wrote:
> 
> > On Wed, Mar 29, 2023 at 11:10:01AM -0600, Alex Williamson wrote:
> > > I think we don't have it because it's unclear how it's actually
> > > different from a secondary bus reset from the bridge control register,
> > > which is what "bus" would do when selected for the example above.  Per
> > > the spec, both must cause a hot reset.  It seems this device needs a
> > > significantly longer delay though.  
> > 
> > Assuming you are referring to the 2ms sleep in
> > pci_reset_secondary_bus(), then yes. In our case, after disabling the
> > link on the downstream port we need to wait for 500ms before enabling
> > it.
> > 
> > > Note that hot resets can be generated by a userspace driver with
> > > ownership of the device and will make use of the pci-core reset
> > > mechanisms.  Therefore if there is not a device specific reset, we'll
> > > use the standard delays and the device ought not to get itself wedged
> > > if the link becomes active at an unexpected point relative to a
> > > firmware update.  This might be a point in favor of a device specific
> > > reset solution in pci-core.  Thanks,  
> > 
> > I assume you referring to something like this:
> > 
> > # echo 1 > /sys/class/pci_bus/0000:03/device/0000:03:00.0/reset
> > 
> > Doesn't seem to have any effect (network ports remain up, at least).
> > Anyway, this device is completely managed by the kernel, not a user
> > space driver. I'm not aware of anyone using this method to reset the
> > device.
> 
> The pci-sysfs reset attribute is only meant to reset the linked device,
> so if this is a single function device then it might be accessing bus
> reset, but it also might be using FLR or PM reset.  There's a
> reset_method attribute to determine and select.
> 
> In any case, if the device is unaffected, that suggests we're dealing
> with a device that doesn't comply with PCIe reset standards, which
> might suggests it needs a device specific reset or to flag broken reset
> methods regardless.
> 
> Note that QEMU is a vfio-pci userspace driver, so assigning the device
> to a VM, where kernel drivers in the guest are managing the device is a
> use case of userspace drivers which should have a functional reset
> mechanism to avoid data leakage between userspace sessions.
>  
> > If I understand Bjorn and you correctly, we have two options:
> > 
> > 1. Keep the current implementation inside the driver.
> > 
> > 2. Call __pci_reset_function_locked() from the driver and move the link
> > toggling to drivers/pci/quirks.c as a "device_specific" method.
> > 
> > Personally, I don't see any benefit in 2, but we can try to implement
> > it, see if it even works and then decide.
> 
> The second option enables use cases like above, where the PCI-core can
> perform an effective reset of the device rather than embedding that
> into a specific driver.  Even if not intended as a primary use case,
> it's a more complete solution and avoids potentially unhappy users that
> assume such use cases are available.  Thanks,

I believe we are discussing three different issues:

1. Reset by PCI core when driver is bound to the device.
2. Reset by PCI core when driver is not bound to the device.
3. Reset by the driver itself during probe / devlink reload.

These are my notes on each:

1. In this case, the reset_prepare() and reset_done() handlers of the
driver are invoked by the PCI core before and after the PCI reset.
Without them, if the used PCI reset method indeed reset the entire
device, then the driver and the device would be out of sync. I have
implemented these handlers for the driver:
https://github.com/idosch/linux/commit/28bc0dc06c01559c19331578bbba9f2b0460ab5d.patch

2. In this case, the handlers are not available and we only need to
ensure that the used PCI reset method reset the device. The method can
be a generic method such as "pm" or "bus" (which are available in my
case) or a "device_specific" method that is implemented in
drivers/pci/quirks.c by accessing the configuration space of the device.

I need to check if one of the generic methods works for this device and
if not, check with the PCI team what we can do about it.

3. In this case, the driver already issues a reset via its command
interface during probe / devlink reload to ensure it is working with a
device in a clean state. The patch we are discussing merely adds another
reset method. Instead of starting the reset when the command is issued,
the reset will start after toggling the link on the downstream port.

To summarize, I would like to proceed as follows:

1. Submit the following patch that implements the PCI reset handlers for
the device:
https://github.com/idosch/linux/commit/28bc0dc06c01559c19331578bbba9f2b0460ab5d.patch

2. Check if one of the generic reset methods works for this device and
if not, check with the PCI team what we can do about it.

3. Re-submit the current patchset for inclusion.

Note that 2 does not block 3 since the solution for 2 would be either in
the firmware or in drivers/pci/quirks.c.

Please let me know if you are OK with the above.

Thanks