On Mon, May 21, 2018 at 11:41:01AM +0300, Leon Romanovsky wrote: > From: Erez Shitrit <erezsh@xxxxxxxxxxxx> > > On fatal error the driver simulates CQE's for ULPs that rely on > completion of all their posted work-request. > > For the GSI traffic, the mlx5 has its own mechanism that sends the > completions via software CQE's directly to the relevant CQ. > > This should be kept in fatal error too, so the driver should simulate > such CQE's with the specified error state in order to complete GSI QP > work requests. > > Without the fix the next deadlock might appears: > schedule_timeout+0x274/0x350 > wait_for_common+0xec/0x240 > mcast_remove_one+0xd0/0x120 [ib_core] > ib_unregister_device+0x12c/0x230 [ib_core] > mlx5_ib_remove+0xc4/0x270 [mlx5_ib] > mlx5_detach_device+0x184/0x1a0 [mlx5_core] > mlx5_unload_one+0x308/0x340 [mlx5_core] > mlx5_pci_err_detected+0x74/0xe0 [mlx5_core] > > Cc: <stable@xxxxxxxxxxxxxxx> # 4.7 > Fixes: 89ea94a7b6c4 ("IB/mlx5: Reset flow support for IB kernel ULPs") > Signed-off-by: Erez Shitrit <erezsh@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > --- > Hi, > > I'm sending this to rdma-next, because anyway it is going to stable@ and > this is "old" bug and not important enough for -rc6. > Thanks > --- > drivers/infiniband/hw/mlx5/cq.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) Applied to for-next, thanks Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html