Patch "net/mlx5: Discard command completions in internal error" has been added to the 6.9-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    net/mlx5: Discard command completions in internal error

to the 6.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     net-mlx5-discard-command-completions-in-internal-err.patch
and it can be found in the queue-6.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 9c204d542323062cda95d0163cd3011d53dc2fbc
Author: Akiva Goldberger <agoldberger@xxxxxxxxxx>
Date:   Thu May 9 14:29:51 2024 +0300

    net/mlx5: Discard command completions in internal error
    
    [ Upstream commit db9b31aa9bc56ff0d15b78f7e827d61c4a096e40 ]
    
    Fix use after free when FW completion arrives while device is in
    internal error state. Avoid calling completion handler in this case,
    since the device will flush the command interface and trigger all
    completions manually.
    
    Kernel log:
    ------------[ cut here ]------------
    refcount_t: underflow; use-after-free.
    ...
    RIP: 0010:refcount_warn_saturate+0xd8/0xe0
    ...
    Call Trace:
    <IRQ>
    ? __warn+0x79/0x120
    ? refcount_warn_saturate+0xd8/0xe0
    ? report_bug+0x17c/0x190
    ? handle_bug+0x3c/0x60
    ? exc_invalid_op+0x14/0x70
    ? asm_exc_invalid_op+0x16/0x20
    ? refcount_warn_saturate+0xd8/0xe0
    cmd_ent_put+0x13b/0x160 [mlx5_core]
    mlx5_cmd_comp_handler+0x5f9/0x670 [mlx5_core]
    cmd_comp_notifier+0x1f/0x30 [mlx5_core]
    notifier_call_chain+0x35/0xb0
    atomic_notifier_call_chain+0x16/0x20
    mlx5_eq_async_int+0xf6/0x290 [mlx5_core]
    notifier_call_chain+0x35/0xb0
    atomic_notifier_call_chain+0x16/0x20
    irq_int_handler+0x19/0x30 [mlx5_core]
    __handle_irq_event_percpu+0x4b/0x160
    handle_irq_event+0x2e/0x80
    handle_edge_irq+0x98/0x230
    __common_interrupt+0x3b/0xa0
    common_interrupt+0x7b/0xa0
    </IRQ>
    <TASK>
    asm_common_interrupt+0x22/0x40
    
    Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling")
    Signed-off-by: Akiva Goldberger <agoldberger@xxxxxxxxxx>
    Reviewed-by: Moshe Shemesh <moshe@xxxxxxxxxx>
    Signed-off-by: Tariq Toukan <tariqt@xxxxxxxxxx>
    Link: https://lore.kernel.org/r/20240509112951.590184-6-tariqt@xxxxxxxxxx
    Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 511e7fee39ac5..20768ef2e9d2b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1634,6 +1634,9 @@ static int cmd_comp_notifier(struct notifier_block *nb,
 	dev = container_of(cmd, struct mlx5_core_dev, cmd);
 	eqe = data;
 
+	if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
+		return NOTIFY_DONE;
+
 	mlx5_cmd_comp_handler(dev, be32_to_cpu(eqe->data.cmd.vector), false);
 
 	return NOTIFY_OK;




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux