Patch "net/tg3: fix race condition in tg3_reset_task()" has been added to the 4.19-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    net/tg3: fix race condition in tg3_reset_task()

to the 4.19-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     net-tg3-fix-race-condition-in-tg3_reset_task.patch
and it can be found in the queue-4.19 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 1b29912a320e06a0d3d257ccc5a7a419f1f07ab5
Author: Thinh Tran <thinhtr@xxxxxxxxxxxxxxxxxx>
Date:   Thu Nov 30 18:19:11 2023 -0600

    net/tg3: fix race condition in tg3_reset_task()
    
    [ Upstream commit 16b55b1f2269962fb6b5154b8bf43f37c9a96637 ]
    
    When an EEH error is encountered by a PCI adapter, the EEH driver
    modifies the PCI channel's state as shown below:
    
       enum {
          /* I/O channel is in normal state */
          pci_channel_io_normal = (__force pci_channel_state_t) 1,
    
          /* I/O to channel is blocked */
          pci_channel_io_frozen = (__force pci_channel_state_t) 2,
    
          /* PCI card is dead */
          pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
       };
    
    If the same EEH error then causes the tg3 driver's transmit timeout
    logic to execute, the tg3_tx_timeout() function schedules a reset
    task via tg3_reset_task_schedule(), which may cause a race condition
    between the tg3 and EEH driver as both attempt to recover the HW via
    a reset action.
    
    EEH driver gets error event
    --> eeh_set_channel_state()
        and set device to one of
        error state above           scheduler: tg3_reset_task() get
                                    returned error from tg3_init_hw()
                                 --> dev_close() shuts down the interface
    tg3_io_slot_reset() and
    tg3_io_resume() fail to
    reset/resume the device
    
    To resolve this issue, we avoid the race condition by checking the PCI
    channel state in the tg3_reset_task() function and skip the tg3 driver
    initiated reset when the PCI channel is not in the normal state.  (The
    driver has no access to tg3 device registers at this point and cannot
    even complete the reset task successfully without external assistance.)
    We'll leave the reset procedure to be managed by the EEH driver which
    calls the tg3_io_error_detected(), tg3_io_slot_reset() and
    tg3_io_resume() functions as appropriate.
    
    Adding the same checking in tg3_dump_state() to avoid dumping all
    device registers when the PCI channel is not in the normal state.
    
    Signed-off-by: Thinh Tran <thinhtr@xxxxxxxxxxxxxxxxxx>
    Tested-by: Venkata Sai Duggi <venkata.sai.duggi@xxxxxxx>
    Reviewed-by: David Christensen <drc@xxxxxxxxxxxxxxxxxx>
    Reviewed-by: Michael Chan <michael.chan@xxxxxxxxxxxx>
    Link: https://lore.kernel.org/r/20231201001911.656-1-thinhtr@xxxxxxxxxxxxxxxxxx
    Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 68bb4a2ff7ce..af0186a527a3 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6449,6 +6449,14 @@ static void tg3_dump_state(struct tg3 *tp)
 	int i;
 	u32 *regs;
 
+	/* If it is a PCI error, all registers will be 0xffff,
+	 * we don't dump them out, just report the error and return
+	 */
+	if (tp->pdev->error_state != pci_channel_io_normal) {
+		netdev_err(tp->dev, "PCI channel ERROR!\n");
+		return;
+	}
+
 	regs = kzalloc(TG3_REG_BLK_SIZE, GFP_ATOMIC);
 	if (!regs)
 		return;
@@ -11199,7 +11207,8 @@ static void tg3_reset_task(struct work_struct *work)
 	rtnl_lock();
 	tg3_full_lock(tp, 0);
 
-	if (tp->pcierr_recovery || !netif_running(tp->dev)) {
+	if (tp->pcierr_recovery || !netif_running(tp->dev) ||
+	    tp->pdev->error_state != pci_channel_io_normal) {
 		tg3_flag_clear(tp, RESET_TASK_PENDING);
 		tg3_full_unlock(tp);
 		rtnl_unlock();




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux