Re: [PATCH net-next 6/6] mlxsw: pci: Add support for new reset flow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 29 Mar 2023 11:01:44 -0500
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

> [+cc Alex, Lukas for link-disable reset thoughts, beginning of thread:
> https://lore.kernel.org/all/cover.1679502371.git.petrm@xxxxxxxxxx/]
> 
> On Sun, Mar 26, 2023 at 04:53:58PM +0300, Ido Schimmel wrote:
> > On Thu, Mar 23, 2023 at 11:51:15AM -0500, Bjorn Helgaas wrote:  
> > > On Wed, Mar 22, 2023 at 05:49:35PM +0100, Petr Machata wrote:  
> > > > From: Amit Cohen <amcohen@xxxxxxxxxx>
> > > > 
> > > > The driver resets the device during probe and during a devlink reload.
> > > > The current reset method reloads the current firmware version or a pending
> > > > one, if one was previously flashed using devlink. However, the reset does
> > > > not take down the PCI link, preventing the PCI firmware from being
> > > > upgraded, unless the system is rebooted.  
> > > 
> > > Just to make sure I understand this correctly, the above sounds like
> > > "firmware" includes two parts that have different rules for loading:
> > > 
> > >   - Current reset method is completely mlxsw-specific and resets the
> > >     mlxsw core but not the PCIe interface; this loads only firmware
> > >     part A
> > > 
> > >   - A PCIe reset resets both the mlxsw core and the PCIe interface;
> > >     this loads both firmware part A and part B  
> > 
> > Yes. A few years ago I had to flash a new firmware in order to test a
> > fix in the PCIe firmware and the bug still reproduced after a devlink
> > reload. Only after a reboot the new PCIe firmware was loaded and the bug
> > was fixed. Bugs in PCIe firmware are not common, but we would like to
> > avoid the scenario where users must reboot the machine in order to load
> > the new firmware.
> >   
> > > > To solve this problem, a new reset command (6) was implemented in the
> > > > firmware. Unlike the current command (1), after issuing the new command
> > > > the device will not start the reset immediately, but only after the PCI
> > > > link was disabled. The driver is expected to wait for 500ms before
> > > > re-enabling the link to give the firmware enough time to start the reset.  
> > > 
> > > I guess the idea here is that the mlxsw driver:
> > > 
> > >   - Tells the firmware we're going to reset
> > >     (MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE)
> > > 
> > >   - Saves PCI config state
> > > 
> > >   - Disables the link (mlxsw_pci_link_toggle()), which causes a PCIe
> > >     hot reset
> > > 
> > >   - The firmware notices the link disable and starts its own internal
> > >     reset
> > > 
> > >   - The mlxsw driver waits 500ms
> > >     (MLXSW_PCI_TOGGLE_WAIT_BEFORE_EN_MSECS)
> > > 
> > >   - Enables link and waits for it to be active
> > >     (mlxsw_pci_link_active_check()
> > > 
> > >   - Waits for device to be ready again (mlxsw_pci_device_id_read())  
> > 
> > Correct.
> >   
> > > So the first question is why you don't simply use
> > > pci_reset_function(), since it is supposed to cause a reset and do all
> > > the necessary waiting for the device to be ready.  This is quite
> > > complicated to do correctly; in fact, we still discover issues there
> > > regularly.  There are many special cases in PCIe r6.0, sec 6.6.1, and
> > > it would be much better if we can avoid trying to handle them all in
> > > individual drivers.  
> > 
> > I see that this function takes the device lock and I think (didn't try)
> > it will deadlock if we were to replace the current code with it since we
> > also perform a reset during probe where I believe the device lock is
> > already taken.
> > 
> > __pci_reset_function_locked() is another option, but it assumes the
> > device lock was already taken, which is correct during probe, but not
> > when reset is performed as part of devlink reload.
> > 
> > Let's put the locking issues aside and assume we can use
> > __pci_reset_function_locked(). I'm trying to figure out what it can
> > allow us to remove from the driver in favor of common PCI code. It
> > essentially invokes one of the supported reset methods. Looking at my
> > device, I see the following:
> > 
> >  # cat /sys/class/pci_bus/0000\:01/device/0000\:01\:00.0/reset_method 
> >  pm bus
> > 
> > So I assume it will invoke pci_pm_reset(). I'm not sure it can work for
> > us as our reset procedure requires us to disable the link on the
> > downstream port as a way of notifying the device that it should start
> > the reset procedure.  
> 
> Hmmm, pci_pm_reset() puts the device in D3hot, then back to D0.  Spec
> says that results in "undefined internal Function state," which
> doesn't even sound like a guaranteed reset, but it's what we have, and
> in any case, it does not disable the link.
> 
> > We might be able to use the "device_specific" method and add quirks in
> > "pci_dev_reset_methods". However, I'm not sure what would be the
> > benefit, as it basically means moving the code in
> > mlxsw_pci_link_toggle() to drivers/pci/quirks.c. Also, when the "probe"
> > argument is "true" we can't actually determine if this reset method is
> > supported or not, as we can't query that from the configuration space of
> > the device in the current implementation. It's queried using a command
> > interface that is specific to mlxsw and resides in the driver itself.
> > Not usable from drivers/pci/quirks.c.  
> 
> Spec (PCIe r6.0, sec 6.6.1) says "Disabling a Link causes Downstream
> components to undergo a hot reset."  That seems like it *could* be a
> general-purpose method of doing a reset, and I don't know why the PCI
> core doesn't support it.  Added Alex and Lukas in case they know.

I think we don't have it because it's unclear how it's actually
different from a secondary bus reset from the bridge control register,
which is what "bus" would do when selected for the example above.  Per
the spec, both must cause a hot reset.  It seems this device needs a
significantly longer delay though.

Note that hot resets can be generated by a userspace driver with
ownership of the device and will make use of the pci-core reset
mechanisms.  Therefore if there is not a device specific reset, we'll
use the standard delays and the device ought not to get itself wedged
if the link becomes active at an unexpected point relative to a
firmware update.  This might be a point in favor of a device specific
reset solution in pci-core.  Thanks,

Alex
 
> But it sounds like there's some wrinkle with your device?  I suppose a
> link disable actually causes a reset, but that reset may not trigger
> the firmware reload you need?  If we had a generic "disable link"
> reset method, maybe a device quirk could disable it if necessary?
> 
> > > Of course, pci_reset_function() does *not* include details like
> > > MLXSW_PCI_TOGGLE_WAIT_BEFORE_EN_MSECS.
> > > 
> > > I assume that flashing the firmware to the device followed by a power
> > > cycle (without ever doing MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE)
> > > would load the new firmware everywhere.  Can we not do the same with a
> > > PCIe reset?  
> > 
> > Yes, that's what we would like to achieve.
> > 
> > Thanks for the feedback!  
> 




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux