Patch "igb: clean up in all error paths when enabling SR-IOV" has been added to the 6.5-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    igb: clean up in all error paths when enabling SR-IOV

to the 6.5-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     igb-clean-up-in-all-error-paths-when-enabling-sr-iov.patch
and it can be found in the queue-6.5 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 03a1320d59100206c53fd1903931325f38b0c1c5
Author: Corinna Vinschen <vinschen@xxxxxxxxxx>
Date:   Mon Sep 11 13:28:49 2023 -0700

    igb: clean up in all error paths when enabling SR-IOV
    
    [ Upstream commit bc6ed2fa24b14e40e1005488bbe11268ce7108fa ]
    
    After commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), removing
    the igb module could hang or crash (depending on the machine) when the
    module has been loaded with the max_vfs parameter set to some value != 0.
    
    In case of one test machine with a dual port 82580, this hang occurred:
    
    [  232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1
    [  233.093257] igb 0000:41:00.1: IOV Disabled
    [  233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0
    [  233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata)
    [  233.352248] igb 0000:41:00.0:   device [8086:1516] error status/mask=00100000
    [  233.361088] igb 0000:41:00.0:    [20] UnsupReq               (First)
    [  233.368183] igb 0000:41:00.0: AER:   TLP Header: 40000001 0000040f cdbfc00c c
    [  233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata)
    [  233.388779] igb 0000:41:00.1:   device [8086:1516] error status/mask=00100000
    [  233.397629] igb 0000:41:00.1:    [20] UnsupReq               (First)
    [  233.404736] igb 0000:41:00.1: AER:   TLP Header: 40000001 0000040f cdbfc00c c
    [  233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback)
    [  233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0
    [  233.546197] pcieport 0000:40:01.0: AER: device recovery failed
    [  234.157244] igb 0000:41:00.0: IOV Disabled
    [  371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds.
    [  371.627489]       Not tainted 6.4.0-dirty #2
    [  371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
    [  371.641000] task:irq/35-aerdrv   state:D stack:0     pid:257   ppid:2      f0
    [  371.650330] Call Trace:
    [  371.653061]  <TASK>
    [  371.655407]  __schedule+0x20e/0x660
    [  371.659313]  schedule+0x5a/0xd0
    [  371.662824]  schedule_preempt_disabled+0x11/0x20
    [  371.667983]  __mutex_lock.constprop.0+0x372/0x6c0
    [  371.673237]  ? __pfx_aer_root_reset+0x10/0x10
    [  371.678105]  report_error_detected+0x25/0x1c0
    [  371.682974]  ? __pfx_report_normal_detected+0x10/0x10
    [  371.688618]  pci_walk_bus+0x72/0x90
    [  371.692519]  pcie_do_recovery+0xb2/0x330
    [  371.696899]  aer_process_err_devices+0x117/0x170
    [  371.702055]  aer_isr+0x1c0/0x1e0
    [  371.705661]  ? __set_cpus_allowed_ptr+0x54/0xa0
    [  371.710723]  ? __pfx_irq_thread_fn+0x10/0x10
    [  371.715496]  irq_thread_fn+0x20/0x60
    [  371.719491]  irq_thread+0xe6/0x1b0
    [  371.723291]  ? __pfx_irq_thread_dtor+0x10/0x10
    [  371.728255]  ? __pfx_irq_thread+0x10/0x10
    [  371.732731]  kthread+0xe2/0x110
    [  371.736243]  ? __pfx_kthread+0x10/0x10
    [  371.740430]  ret_from_fork+0x2c/0x50
    [  371.744428]  </TASK>
    
    The reproducer was a simple script:
    
      #!/bin/sh
      for i in `seq 1 5`; do
        modprobe -rv igb
        modprobe -v igb max_vfs=1
        sleep 1
        modprobe -rv igb
      done
    
    It turned out that this could only be reproduce on 82580 (quad and
    dual-port), but not on 82576, i350 and i210.  Further debugging showed
    that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because
    dev->is_physfn is 0 on 82580.
    
    Prior to commit 50f303496d92 ("igb: Enable SR-IOV after reinit"),
    igb_enable_sriov() jumped into the "err_out" cleanup branch.  After this
    commit it only returned the error code.
    
    So the cleanup didn't take place, and the incorrect VF setup in the
    igb_adapter structure fooled the igb driver into assuming that VFs have
    been set up where no VF actually existed.
    
    Fix this problem by cleaning up again if pci_enable_sriov() fails.
    
    Fixes: 50f303496d92 ("igb: Enable SR-IOV after reinit")
    Signed-off-by: Corinna Vinschen <vinschen@xxxxxxxxxx>
    Reviewed-by: Akihiko Odaki <akihiko.odaki@xxxxxxxxxx>
    Tested-by: Rafal Romanowski <rafal.romanowski@xxxxxxxxx>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@xxxxxxxxx>
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index ac19730e8db91..12f106b3a878b 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3827,8 +3827,11 @@ static int igb_enable_sriov(struct pci_dev *pdev, int num_vfs, bool reinit)
 	}
 
 	/* only call pci_enable_sriov() if no VFs are allocated already */
-	if (!old_vfs)
+	if (!old_vfs) {
 		err = pci_enable_sriov(pdev, adapter->vfs_allocated_count);
+		if (err)
+			goto err_out;
+	}
 
 	goto out;
 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux