Re: KASAN: use-after-free Read in ucma_close (2)

Jason Gunthorpe <jgg@xxxxxxxx> · Fri, 11 Sep 2020 14:01:39 -0300

On Fri, Sep 11, 2020 at 11:20:17PM +0800, Hillf Danton wrote:
> 
> On Fri, 11 Sep 2020 08:57:50 -0300 Jason Gunthorpe wrote:
> > On Fri, Sep 11, 2020 at 12:16:40PM +0800, Hillf Danton wrote:
> > > Detect race destroying ctx in order to avoid UAF.
> > > 
> > > +++ b/drivers/infiniband/core/ucma.c
> > > @@ -625,6 +625,10 @@ static ssize_t ucma_destroy_id(struct uc
> > >  		return PTR_ERR(ctx);
> > >  
> > >  	mutex_lock(&ctx->file->mut);
> > > +	if (ctx->destroying == 1) {
> > > +		mutex_unlock(&ctx->file->mut);
> > > +		return -ENXIO;
> > > +	}
> > >  	ctx->destroying = 1;
> > >  	mutex_unlock(&ctx->file->mut);
> > >  
> > > @@ -1826,6 +1830,8 @@ static int ucma_close(struct inode *inod
> > >  
> > >  	mutex_lock(&file->mut);
> > >  	list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
> > > +		if (ctx->destroying == 1)
> > > +			continue;
> > >  		ctx->destroying = 1;
> > >  		mutex_unlock(&file->mut);
> > >  
> > 
> > ucma_destroy_id() is called from write() and ucma_close is release(),
> > so there is no way these can race?
> 
> Sound good but what's reported is uaf in the close path, which is
> impossible without another thread releasing the ctx a step ahead
> the closer.
> Can we call it a race if that's true?

Migrate is the cause, very tricky:

		CPU0                      CPU1
	ucma_destroy_id()
				  ucma_migrate_id()
				       ucma_get_ctx()
	xa_lock()
	 _ucma_find_context()
	 xa_erase()
				       xa_lock()
					ctx->file = new_file
					list_move()
				       xa_unlock()
				      ucma_put_ctx
				   ucma_close()
				      _destroy_id()

	_destroy_id()
	  wait_for_completion()
	  // boom

ie the destrory_id() on the initial FD captures the ctx right before
migrate moves it, then the new FD closes calling destroy while the
other destroy is still running.

Sigh, I will rewrite migrate too..

Jason