RE: [PATCH] Revert "RDMA/rxe: Remove unnecessary mr testing"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 2, 2022 11:43 PM Jason Gunthorpe wrote:
> On Fri, Dec 02, 2022 at 02:35:01PM +0000, lizhijian@xxxxxxxxxxx wrote:
> >
> >
> > on 12/2/2022 7:45 PM, Zhu Yanjun wrote:
> > > On Fri, Dec 2, 2022 at 7:02 PM Daisuke Matsuda
> > > <matsuda-daisuke@xxxxxxxxxxx> wrote:
> > >>
> > >> The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes
> > >> a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr
> > >> is not set in check_rkey(). The mr is NULL in this case, and a NULL pointer
> > >> dereference occurs as shown below.
> > >>
> > >> [  139.607580] BUG: kernel NULL pointer dereference, address: 0000000000000010
> > >> [  139.609169] #PF: supervisor write access in kernel mode
> > >> [  139.610314] #PF: error_code(0x0002) - not-present page
> > >> [  139.611434] PGD 0 P4D 0
> > >> [  139.612031] Oops: 0002 [#1] PREEMPT SMP PTI
> > >> [  139.612975] CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34
> > >> [  139.614465] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > >> [  139.616142] RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe]
> > >> [  139.617065] Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f
> 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c
> > >> [  139.620451] RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246
> > >> [  139.621413] RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000
> > >> [  139.622718] RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000
> > >> [  139.624025] RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f
> > >> [  139.625328] R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000
> > >> [  139.626632] R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000
> > >> [  139.627941] FS:  00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000
> > >> [  139.629418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [  139.630480] CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0
> > >> [  139.631805] Call Trace:
> > >> [  139.632288]  <IRQ>
> > >> [  139.632688]  read_reply+0xda/0x310 [rdma_rxe]
> > >> [  139.633515]  rxe_responder+0x82d/0xe50 [rdma_rxe]
> > >> [  139.634398]  do_task+0x84/0x170 [rdma_rxe]
> > >> [  139.635187]  tasklet_action_common.constprop.0+0xa7/0x120
> > >> [  139.636189]  __do_softirq+0xcb/0x2ac
> > >> [  139.636877]  do_softirq+0x63/0x90
> > >> [  139.637505]  </IRQ>
> > >>
> > >> Link: https://lore.kernel.org/lkml/1666582315-2-1-git-send-email-lizhijian@xxxxxxxxxxx/
> > >> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx>
> >
> > Good catch, want to know what workload you are running.
> > I have never got it in pyverbs tests.

I found the issue when running my personal testcase for test_odp.py.

> >
> > Add a TODOs: add pyverbs test to cover this scenario.

Zhijian thankfully did it two days ago, but we should also have the RDMA Write counterpart.
Future changes may trigger the similar problem in write_data_in(), so I posted it.
cf. https://github.com/linux-rdma/rdma-core/pull/1269

Daisuke

> 
> Yes please
> 
> Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux