On Wed, Jan 10, 2018 at 06:40:25PM +0000, Bart Van Assche wrote: > On Wed, 2018-01-10 at 11:26 -0700, Jason Gunthorpe wrote: > > On Wed, Jan 10, 2018 at 08:42:03AM -0500, Laurence Oberman wrote: > > > > > [ 946.647514] kernel tried to execute NX-protected page - exploit > > > attempt? (uid: 0) > > > [ 946.691954] BUG: unable to handle kernel paging request at > > > 00000000a2129b93 > > > [ 947.889552] Call Trace: > > > [ 947.903724] ? __ib_process_cq+0x55/0xa0 [ib_core] > > > [ 947.931179] ? ib_cq_poll_work+0x1b/0x60 [ib_core] > > > [ 947.958153] ? process_one_work+0x141/0x340 > > > [ 947.981362] ? worker_thread+0x47/0x3e0 > > > [ 948.002102] ? kthread+0xf5/0x130 > > > [ 948.020538] ? rescuer_thread+0x380/0x380 > > > [ 948.043180] ? kthread_associate_blkcg+0x90/0x90 > > > [ 948.070184] ? ret_from_fork+0x1f/0x30 > > > > These oops's you have are very suggestive that ib_wc->wr_cqe > > is garbage.. > > > > Did SRP free its wr_cqe data before completion somehow? > > > > Turn on slab poisoning to confirm? > > It's easy to see in drivers/infiniband/core/cq.c that polling is > stopped before a completion queue is destroyed (see also the > cancel_work_sync(&cq->work) and the cq->device->destroy_cq(cq) calls > in ib_free_cq()). But that has nothing directly to do with the lifetime of, say, struct srp_request which contains ib_wc->wr_cqe? eg freeing struct srp_request before the wrid has passed through the CQ poll would produce these sorts of symptoms... Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html