Re: Kernel v4.16 / v4.17 SRP and SRPT patches

Jason Gunthorpe <jgg@xxxxxxxx> · Wed, 10 Jan 2018 12:17:58 -0700

On Wed, Jan 10, 2018 at 06:40:25PM +0000, Bart Van Assche wrote:
> On Wed, 2018-01-10 at 11:26 -0700, Jason Gunthorpe wrote:
> > On Wed, Jan 10, 2018 at 08:42:03AM -0500, Laurence Oberman wrote:
> > 
> > > [  946.647514] kernel tried to execute NX-protected page - exploit
> > > attempt? (uid: 0)
> > > [  946.691954] BUG: unable to handle kernel paging request at
> > > 00000000a2129b93
> > > [  947.889552] Call Trace:
> > > [  947.903724]  ? __ib_process_cq+0x55/0xa0 [ib_core]
> > > [  947.931179]  ? ib_cq_poll_work+0x1b/0x60 [ib_core]
> > > [  947.958153]  ? process_one_work+0x141/0x340
> > > [  947.981362]  ? worker_thread+0x47/0x3e0
> > > [  948.002102]  ? kthread+0xf5/0x130
> > > [  948.020538]  ? rescuer_thread+0x380/0x380
> > > [  948.043180]  ? kthread_associate_blkcg+0x90/0x90
> > > [  948.070184]  ? ret_from_fork+0x1f/0x30
> > 
> > These oops's you have are very suggestive that ib_wc->wr_cqe
> > is garbage..
> > 
> > Did SRP free its wr_cqe data before completion somehow?
> > 
> > Turn on slab poisoning to confirm?
> 
> It's easy to see in drivers/infiniband/core/cq.c that polling is
> stopped before a completion queue is destroyed (see also the
> cancel_work_sync(&cq->work) and the cq->device->destroy_cq(cq) calls
> in ib_free_cq()).

But that has nothing directly to do with the lifetime of, say, struct
srp_request which contains ib_wc->wr_cqe?

eg freeing struct srp_request before the wrid has passed through the
CQ poll would produce these sorts of symptoms...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html