[+cc Alex, Lukas for link-disable reset thoughts, beginning of thread: https://lore.kernel.org/all/cover.1679502371.git.petrm@xxxxxxxxxx/] On Sun, Mar 26, 2023 at 04:53:58PM +0300, Ido Schimmel wrote: > On Thu, Mar 23, 2023 at 11:51:15AM -0500, Bjorn Helgaas wrote: > > On Wed, Mar 22, 2023 at 05:49:35PM +0100, Petr Machata wrote: > > > From: Amit Cohen <amcohen@xxxxxxxxxx> > > > > > > The driver resets the device during probe and during a devlink reload. > > > The current reset method reloads the current firmware version or a pending > > > one, if one was previously flashed using devlink. However, the reset does > > > not take down the PCI link, preventing the PCI firmware from being > > > upgraded, unless the system is rebooted. > > > > Just to make sure I understand this correctly, the above sounds like > > "firmware" includes two parts that have different rules for loading: > > > > - Current reset method is completely mlxsw-specific and resets the > > mlxsw core but not the PCIe interface; this loads only firmware > > part A > > > > - A PCIe reset resets both the mlxsw core and the PCIe interface; > > this loads both firmware part A and part B > > Yes. A few years ago I had to flash a new firmware in order to test a > fix in the PCIe firmware and the bug still reproduced after a devlink > reload. Only after a reboot the new PCIe firmware was loaded and the bug > was fixed. Bugs in PCIe firmware are not common, but we would like to > avoid the scenario where users must reboot the machine in order to load > the new firmware. > > > > To solve this problem, a new reset command (6) was implemented in the > > > firmware. Unlike the current command (1), after issuing the new command > > > the device will not start the reset immediately, but only after the PCI > > > link was disabled. The driver is expected to wait for 500ms before > > > re-enabling the link to give the firmware enough time to start the reset. > > > > I guess the idea here is that the mlxsw driver: > > > > - Tells the firmware we're going to reset > > (MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE) > > > > - Saves PCI config state > > > > - Disables the link (mlxsw_pci_link_toggle()), which causes a PCIe > > hot reset > > > > - The firmware notices the link disable and starts its own internal > > reset > > > > - The mlxsw driver waits 500ms > > (MLXSW_PCI_TOGGLE_WAIT_BEFORE_EN_MSECS) > > > > - Enables link and waits for it to be active > > (mlxsw_pci_link_active_check() > > > > - Waits for device to be ready again (mlxsw_pci_device_id_read()) > > Correct. > > > So the first question is why you don't simply use > > pci_reset_function(), since it is supposed to cause a reset and do all > > the necessary waiting for the device to be ready. This is quite > > complicated to do correctly; in fact, we still discover issues there > > regularly. There are many special cases in PCIe r6.0, sec 6.6.1, and > > it would be much better if we can avoid trying to handle them all in > > individual drivers. > > I see that this function takes the device lock and I think (didn't try) > it will deadlock if we were to replace the current code with it since we > also perform a reset during probe where I believe the device lock is > already taken. > > __pci_reset_function_locked() is another option, but it assumes the > device lock was already taken, which is correct during probe, but not > when reset is performed as part of devlink reload. > > Let's put the locking issues aside and assume we can use > __pci_reset_function_locked(). I'm trying to figure out what it can > allow us to remove from the driver in favor of common PCI code. It > essentially invokes one of the supported reset methods. Looking at my > device, I see the following: > > # cat /sys/class/pci_bus/0000\:01/device/0000\:01\:00.0/reset_method > pm bus > > So I assume it will invoke pci_pm_reset(). I'm not sure it can work for > us as our reset procedure requires us to disable the link on the > downstream port as a way of notifying the device that it should start > the reset procedure. Hmmm, pci_pm_reset() puts the device in D3hot, then back to D0. Spec says that results in "undefined internal Function state," which doesn't even sound like a guaranteed reset, but it's what we have, and in any case, it does not disable the link. > We might be able to use the "device_specific" method and add quirks in > "pci_dev_reset_methods". However, I'm not sure what would be the > benefit, as it basically means moving the code in > mlxsw_pci_link_toggle() to drivers/pci/quirks.c. Also, when the "probe" > argument is "true" we can't actually determine if this reset method is > supported or not, as we can't query that from the configuration space of > the device in the current implementation. It's queried using a command > interface that is specific to mlxsw and resides in the driver itself. > Not usable from drivers/pci/quirks.c. Spec (PCIe r6.0, sec 6.6.1) says "Disabling a Link causes Downstream components to undergo a hot reset." That seems like it *could* be a general-purpose method of doing a reset, and I don't know why the PCI core doesn't support it. Added Alex and Lukas in case they know. But it sounds like there's some wrinkle with your device? I suppose a link disable actually causes a reset, but that reset may not trigger the firmware reload you need? If we had a generic "disable link" reset method, maybe a device quirk could disable it if necessary? > > Of course, pci_reset_function() does *not* include details like > > MLXSW_PCI_TOGGLE_WAIT_BEFORE_EN_MSECS. > > > > I assume that flashing the firmware to the device followed by a power > > cycle (without ever doing MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE) > > would load the new firmware everywhere. Can we not do the same with a > > PCIe reset? > > Yes, that's what we would like to achieve. > > Thanks for the feedback!