Re: Issue after 5.4.70->5.4.77 update: mlx5_core: reg_mr_callback: async reg mr failed. status -11

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/17/2020 9:50 PM, jgg@xxxxxxxx wrote:
On Tue, Nov 17, 2020 at 04:54:30PM +0100, Timo Rothenpieler wrote:
The most likely candidate for this seems to be
0ec52f0194638e2d284ad55eba5a7aff753de1b9(RDMA/mlx5: Disable
IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't work)  which was merged
in 5.4.73. There were also a lot of mlx5 related changes in 5.4.71 though.
Though since this is a production system, I cannot sensibly bisect this.

It is very unlikely, neither mlx5 or ipoib read that bit.

That error print is very bad:

   Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: cmd_work_handler:887:(pid 383): failed to allocate command entry

It really shouldn't happen

This is more likely the cause:

commit 073fff8102062cd675170ceb54d90da22fe7e668
Author: Eran Ben Elisha <eranbe@xxxxxxxxxxxx>
Date:   Tue Aug 4 10:40:21 2020 +0300

     net/mlx5: Avoid possible free of command entry while timeout comp handler
[ Upstream commit 50b2412b7e7862c5af0cbf4b10d93bc5c712d021 ] Upon command completion timeout, driver simulates a forced command
     completion. In a rare case where real interrupt for that command arrives
     simultaneously, it might release the command entry while the forced
     handler might still access it.

Most likely it is missing some element.

Eran, can you check why v5.4.77 is totaly broken?

linux-5.4.y branch is missing the fixes below:

1. 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout
2. 410bd754cd73 net/mlx5: Add retry mechanism to the command entry ...

The second fix in particular matches Timo's bug report.
It does not directly fix the offending commit, however the offending commit raised the probability to bump with this issue.

Saeed, can you notify about it to stable maintainers? I assume every stable branch should have all these 3 commits or non of them.

Eran


Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux