Patch "mlxsw: pci: Fix possible crash during initialization" has been added to the 6.2-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    mlxsw: pci: Fix possible crash during initialization

to the 6.2-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mlxsw-pci-fix-possible-crash-during-initialization.patch
and it can be found in the queue-6.2 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 55b9444c96aa7ff435acba3f60d6301845a5ec06
Author: Ido Schimmel <idosch@xxxxxxxxxx>
Date:   Mon Apr 17 18:52:51 2023 +0200

    mlxsw: pci: Fix possible crash during initialization
    
    [ Upstream commit 1f64757ee2bb22a93ec89b4c71707297e8cca0ba ]
    
    During initialization the driver issues a reset command via its command
    interface in order to remove previous configuration from the device.
    
    After issuing the reset, the driver waits for 200ms before polling on
    the "system_status" register using memory-mapped IO until the device
    reaches a ready state (0x5E). The wait is necessary because the reset
    command only triggers the reset, but the reset itself happens
    asynchronously. If the driver starts polling too soon, the read of the
    "system_status" register will never return and the system will crash
    [1].
    
    The issue was discovered when the device was flashed with a development
    firmware version where the reset routine took longer to complete. The
    issue was fixed in the firmware, but it exposed the fact that the
    current wait time is borderline.
    
    Fix by increasing the wait time from 200ms to 400ms. With this patch and
    the buggy firmware version, the issue did not reproduce in 10 reboots
    whereas without the patch the issue is reproduced quite consistently.
    
    [1]
    mce: CPUs not responding to MCE broadcast (may include false positives): 0,4
    mce: CPUs not responding to MCE broadcast (may include false positives): 0,4
    Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
    Shutting down cpus with NMI
    Kernel Offset: 0x12000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    
    Fixes: ac004e84164e ("mlxsw: pci: Wait longer before accessing the device after reset")
    Signed-off-by: Ido Schimmel <idosch@xxxxxxxxxx>
    Reviewed-by: Petr Machata <petrm@xxxxxxxxxx>
    Signed-off-by: Petr Machata <petrm@xxxxxxxxxx>
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
index 48dbfea0a2a1d..7cdf0ce24f288 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
@@ -26,7 +26,7 @@
 #define MLXSW_PCI_CIR_TIMEOUT_MSECS		1000
 
 #define MLXSW_PCI_SW_RESET_TIMEOUT_MSECS	900000
-#define MLXSW_PCI_SW_RESET_WAIT_MSECS		200
+#define MLXSW_PCI_SW_RESET_WAIT_MSECS		400
 #define MLXSW_PCI_FW_READY			0xA1844
 #define MLXSW_PCI_FW_READY_MASK			0xFFFF
 #define MLXSW_PCI_FW_READY_MAGIC		0x5E



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux