Hi Bjorn, On Thu, Mar 23, 2023 at 11:51:15AM -0500, Bjorn Helgaas wrote: > Hi Petr, thanks for pointing me here. > > On Wed, Mar 22, 2023 at 05:49:35PM +0100, Petr Machata wrote: > > From: Amit Cohen <amcohen@xxxxxxxxxx> > > > > The driver resets the device during probe and during a devlink reload. > > The current reset method reloads the current firmware version or a pending > > one, if one was previously flashed using devlink. However, the reset does > > not take down the PCI link, preventing the PCI firmware from being > > upgraded, unless the system is rebooted. > > Just to make sure I understand this correctly, the above sounds like > "firmware" includes two parts that have different rules for loading: > > - Current reset method is completely mlxsw-specific and resets the > mlxsw core but not the PCIe interface; this loads only firmware > part A > > - A PCIe reset resets both the mlxsw core and the PCIe interface; > this loads both firmware part A and part B Yes. A few years ago I had to flash a new firmware in order to test a fix in the PCIe firmware and the bug still reproduced after a devlink reload. Only after a reboot the new PCIe firmware was loaded and the bug was fixed. Bugs in PCIe firmware are not common, but we would like to avoid the scenario where users must reboot the machine in order to load the new firmware. > > > To solve this problem, a new reset command (6) was implemented in the > > firmware. Unlike the current command (1), after issuing the new command > > the device will not start the reset immediately, but only after the PCI > > link was disabled. The driver is expected to wait for 500ms before > > re-enabling the link to give the firmware enough time to start the reset. > > I guess the idea here is that the mlxsw driver: > > - Tells the firmware we're going to reset > (MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE) > > - Saves PCI config state > > - Disables the link (mlxsw_pci_link_toggle()), which causes a PCIe > hot reset > > - The firmware notices the link disable and starts its own internal > reset > > - The mlxsw driver waits 500ms > (MLXSW_PCI_TOGGLE_WAIT_BEFORE_EN_MSECS) > > - Enables link and waits for it to be active > (mlxsw_pci_link_active_check() > > - Waits for device to be ready again (mlxsw_pci_device_id_read()) Correct. > > So the first question is why you don't simply use > pci_reset_function(), since it is supposed to cause a reset and do all > the necessary waiting for the device to be ready. This is quite > complicated to do correctly; in fact, we still discover issues there > regularly. There are many special cases in PCIe r6.0, sec 6.6.1, and > it would be much better if we can avoid trying to handle them all in > individual drivers. I see that this function takes the device lock and I think (didn't try) it will deadlock if we were to replace the current code with it since we also perform a reset during probe where I believe the device lock is already taken. __pci_reset_function_locked() is another option, but it assumes the device lock was already taken, which is correct during probe, but not when reset is performed as part of devlink reload. Let's put the locking issues aside and assume we can use __pci_reset_function_locked(). I'm trying to figure out what it can allow us to remove from the driver in favor of common PCI code. It essentially invokes one of the supported reset methods. Looking at my device, I see the following: # cat /sys/class/pci_bus/0000\:01/device/0000\:01\:00.0/reset_method pm bus So I assume it will invoke pci_pm_reset(). I'm not sure it can work for us as our reset procedure requires us to disable the link on the downstream port as a way of notifying the device that it should start the reset procedure. We might be able to use the "device_specific" method and add quirks in "pci_dev_reset_methods". However, I'm not sure what would be the benefit, as it basically means moving the code in mlxsw_pci_link_toggle() to drivers/pci/quirks.c. Also, when the "probe" argument is "true" we can't actually determine if this reset method is supported or not, as we can't query that from the configuration space of the device in the current implementation. It's queried using a command interface that is specific to mlxsw and resides in the driver itself. Not usable from drivers/pci/quirks.c. > > Of course, pci_reset_function() does *not* include details like > MLXSW_PCI_TOGGLE_WAIT_BEFORE_EN_MSECS. > > I assume that flashing the firmware to the device followed by a power > cycle (without ever doing MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE) > would load the new firmware everywhere. Can we not do the same with a > PCIe reset? Yes, that's what we would like to achieve. Thanks for the feedback!