Jesse Brandeburg <jesse.brandeburg@xxxxxxxxx> wrote: >When the system boots into the crash dump kernel after a panic, the ice >networking device may still have pending transactions that can cause errors >or machine checks when the device is re-enabled. This can prevent the crash >dump kernel from loading the driver or collecting the crash data. > >To avoid this issue, perform a function level reset (FLR) on the ice device >via PCIe config space before enabling it on the crash kernel. This will >clear any outstanding transactions and stop all queues and interrupts. >Restore the config space after the FLR, otherwise it was found in testing >that the driver wouldn't load successfully. How does this differ from ading "reset_devices" to the crash kernel command line, per Documentation/admin-guide/kdump/kdump.rst? -J >The following sequence causes the original issue: >- Load the ice driver with modprobe ice >- Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs >- Trigger a crash with echo c > /proc/sysrq-trigger >- Load the ice driver again (or let it load automatically) with modprobe ice >- The system crashes again during pcim_enable_device() > >Reported-by: Vishal Agrawal <vagrawal@xxxxxxxxxx> >Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@xxxxxxxxx> >Signed-off-by: Jesse Brandeburg <jesse.brandeburg@xxxxxxxxx> >--- >v2: respond to list comments and update commit message >v1: initial version >--- > drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > >diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c >index c8286adae946..6550c46e4e36 100644 >--- a/drivers/net/ethernet/intel/ice/ice_main.c >+++ b/drivers/net/ethernet/intel/ice/ice_main.c >@@ -6,6 +6,7 @@ > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > #include <generated/utsrelease.h> >+#include <linux/crash_dump.h> > #include "ice.h" > #include "ice_base.h" > #include "ice_lib.h" >@@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent) > return -EINVAL; > } > >+ /* when under a kdump kernel initiate a reset before enabling the >+ * device in order to clear out any pending DMA transactions. These >+ * transactions can cause some systems to machine check when doing >+ * the pcim_enable_device() below. >+ */ >+ if (is_kdump_kernel()) { >+ pci_save_state(pdev); >+ pci_clear_master(pdev); >+ err = pcie_flr(pdev); >+ if (err) >+ return err; >+ pci_restore_state(pdev); >+ } >+ > /* this driver uses devres, see > * Documentation/driver-api/driver-model/devres.rst > */ > >base-commit: 6a70e5cbedaf8ad10528ac9ac114f3ec20f422df >-- >2.39.3 > --- -Jay Vosburgh, jay.vosburgh@xxxxxxxxxxxxx