Re: Issue after 5.4.70->5.4.77 update: mlx5_core: reg_mr_callback: async reg mr failed. status -11

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 17, 2020 at 04:54:30PM +0100, Timo Rothenpieler wrote:
> The most likely candidate for this seems to be
> 0ec52f0194638e2d284ad55eba5a7aff753de1b9(RDMA/mlx5: Disable
> IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't work)  which was merged
> in 5.4.73. There were also a lot of mlx5 related changes in 5.4.71 though.
> Though since this is a production system, I cannot sensibly bisect this.

It is very unlikely, neither mlx5 or ipoib read that bit.

That error print is very bad:

  Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: cmd_work_handler:887:(pid 383): failed to allocate command entry

It really shouldn't happen

This is more likely the cause:

commit 073fff8102062cd675170ceb54d90da22fe7e668
Author: Eran Ben Elisha <eranbe@xxxxxxxxxxxx>
Date:   Tue Aug 4 10:40:21 2020 +0300

    net/mlx5: Avoid possible free of command entry while timeout comp handler
    
    [ Upstream commit 50b2412b7e7862c5af0cbf4b10d93bc5c712d021 ]
    
    Upon command completion timeout, driver simulates a forced command
    completion. In a rare case where real interrupt for that command arrives
    simultaneously, it might release the command entry while the forced
    handler might still access it.

Most likely it is missing some element.

Eran, can you check why v5.4.77 is totaly broken?

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux