Re: [PATCH rdma-rc] RDMA/mlx5: Fix access to wrong pointer while performing flush due to error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 18, 2020 at 11:16:40AM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> 
> The main difference between send and receive SW completions is related
> to separate treatment of WQ queue. For receive completions, the initial
> index to be flushed is stored in "tail", while for send completions, it
> is in deleted "last_poll".
> 
> [62954.657039] CPU: 54 PID: 53405 Comm: kworker/u161:0 Kdump: loaded Tainted: G           OE    --------- -t - 4.18.0-147.el8.ppc64le #1
> [62954.657170] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
> [62954.657234] NIP:  c000003c7c00a000 LR: c00800000e586af4 CTR: c000003c7c00a000
> [62954.657307] REGS: c0000036cc9db940 TRAP: 0400   Tainted: G           OE    --------- -t -  (4.18.0-147.el8.ppc64le)
> [62954.657403] MSR:  9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004488  XER: 20040000
> [62954.657481] CFAR: c00800000e586af0 IRQMASK: 0
> GPR00: c00800000e586ab4 c0000036cc9dbbc0 c00800000e5f1a00 c0000037d8433800
> GPR04: c000003895a26800 c0000037293f2000 0000000000000201 0000000000000011
> GPR08: c000003895a26c80 c000003c7c00a000 0000000000000000 c00800000ed30438
> GPR12: c000003c7c00a000 c000003fff684b80 c00000000017c388 c00000396ec4be40
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: c00000000151e498 0000000000000010 c000003895a26848 0000000000000010
> GPR24: 0000000000000010 0000000000010000 c000003895a26800 0000000000000000
> GPR28: 0000000000000010 c0000037d8433800 c000003895a26c80 c000003895a26800
> [62954.658513] NIP [c000003c7c00a000] 0xc000003c7c00a000
> [62954.658634] LR [c00800000e586af4] __ib_process_cq+0xec/0x1b0 [ib_core]
> [62954.658750] Call Trace:
> [62954.658806] [c0000036cc9dbbc0] [c00800000e586ab4] __ib_process_cq+0xac/0x1b0 [ib_core] (unreliable)
> [62954.658974] [c0000036cc9dbc40] [c00800000e586c88] ib_cq_poll_work+0x40/0xb0 [ib_core]
> [62954.659114] [c0000036cc9dbc70] [c000000000171f44] process_one_work+0x2f4/0x5c0
> [62954.659256] [c0000036cc9dbd10] [c000000000172a0c] worker_thread+0xcc/0x760
> [62954.659388] [c0000036cc9dbdc0] [c00000000017c52c] kthread+0x1ac/0x1c0
> [62954.659521] [c0000036cc9dbe30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80
> [62954.659660] Instruction dump:
> [62954.659735] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> [62954.659886] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> [62954.660040] ---[ end trace cece1d14044f024d ]---
> [62954.678250]
> [62954.678335] Sending IPI to other CPUs
> [62955.479581] IPI complete
> 
> Fixes: 8e3b68830186 ("RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion")
> Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> ---
>  drivers/infiniband/hw/mlx5/cq.c      | 27 +++++++++++++++++++++++++--
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
>  drivers/infiniband/hw/mlx5/qp.c      |  1 +
>  3 files changed, 27 insertions(+), 2 deletions(-)

Applied to for-rc, thanks

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux