On Wed, 2018-01-10 at 11:26 -0700, Jason Gunthorpe wrote: > On Wed, Jan 10, 2018 at 08:42:03AM -0500, Laurence Oberman wrote: > > > [ 946.647514] kernel tried to execute NX-protected page - exploit > > attempt? (uid: 0) > > [ 946.691954] BUG: unable to handle kernel paging request at > > 00000000a2129b93 > > [ 947.889552] Call Trace: > > [ 947.903724] ? __ib_process_cq+0x55/0xa0 [ib_core] > > [ 947.931179] ? ib_cq_poll_work+0x1b/0x60 [ib_core] > > [ 947.958153] ? process_one_work+0x141/0x340 > > [ 947.981362] ? worker_thread+0x47/0x3e0 > > [ 948.002102] ? kthread+0xf5/0x130 > > [ 948.020538] ? rescuer_thread+0x380/0x380 > > [ 948.043180] ? kthread_associate_blkcg+0x90/0x90 > > [ 948.070184] ? ret_from_fork+0x1f/0x30 > > These oops's you have are very suggestive that ib_wc->wr_cqe > is garbage.. > > Did SRP free its wr_cqe data before completion somehow? > > Turn on slab poisoning to confirm? Hello Jason, It's easy to see in drivers/infiniband/core/cq.c that polling is stopped before a completion queue is destroyed (see also the cancel_work_sync(&cq->work) and the cq->device->destroy_cq(cq) calls in ib_free_cq()). BTW, I run all my tests with SLAB poisoning enabled. My SRP tests pass if I run the SRP initiator and target drivers on top of the mlx4 and rdma_rxe drivers. Bart.��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f