Re: uverbs radix tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 26, 2018 at 02:44:46PM -0600, Jason Gunthorpe wrote:
> On Wed, Sep 26, 2018 at 01:23:41PM -0700, Matthew Wilcox wrote:
> > CPU A:
> >
> >         srcu_key = srcu_read_lock(&file->device->disassociate_srcu);
> >         err = ib_uverbs_cmd_verbs(file, &hdr, user_hdr->attrs);
> > ->
> >         slot = radix_tree_iter_lookup(
> >                 &uapi->radix, &attrs_iter,
> >                 uapi_key_obj(hdr->object_id) |
> >                         uapi_key_ioctl_method(hdr->method_id));
> > 
> > CPU B:
> >         radix_tree_for_each_slot (slot, &uapi->radix, &iter, 0) {
> >                 kfree(rcu_dereference_protected(*slot, true));
> >                 radix_tree_iter_delete(&uapi->radix, &iter, slot);
> >         }
> > 
> > radix_tree_iter_delete() ends up calling radix_tree_node_free()
> > which does a call_rcu(&node->rcu_head, radix_tree_node_rcu_free)
> 
> > You're holding an srcu_read_lock(), not an rcu_read_lock(), so
> > as far as RCU is concerned, you're not within a grace period, and
> > it doesn't need to wait for you.
> 
> Hmm....
> 
> This usage of the radix tree is not using RCU. The tree is created at
> the start of time and is used read-only until the final kref is put,
> then it is destroyed.
> 
> The race above cannot exist because the kref on file->device will not
> go to zero while CPU A is running that code above. Specifically the
> file's kref is held by a struct file * and the fs core will hold the
> kref on struct file during execution of the ioctl file_operation.

Ah, OK.  That makes sense, but good luck explaining it to lockdep ;-)

> I think the mistake here is the misleading use of srcu_dereference():
> 
>    srcu_dereference(
> 		*slot,
> 		&pbundle->bundle.ufile->device->disassociate_srcu);
> 
> That should just be rcu_dereference_protected(*slot, true), I think?

aka rcu_dereference_raw()?  Yes, that's probably better.

> > I'm _really_ not a fan of the way you pull various things out of the
> > radix_tree_iter and keep references to them in the pbundle.  Can we
> > figure out a better way to do this?  I don't have a clear picture of
> > your requirements at this stage.
> 
> Yes, I figured that would come up with your xarray work.
> 
> All that is needed here is a 'lookup with hint' API. Something like:
> 
>  struct iter last_position;
> 
>  lookup(0x120, &last_position);
>  lookup_hint(0x123, &last_position);
>  lookup_hint(0x124, &last_position);
> 
> Where lookup_hint is fast if last_position already points at the
> correct chunk. Otherwise it will search again and update
> last_position.
> 
> The last time I looked at the xarray code I thought this would be
> straightforward to add as an API, and I was thinking of proposing
> something like the above when xarray is finally merged.

I can work with that.  I'll send a suggested patch tomorrow.  Probably
something like void *xas_move(struct xa_state *, unsigned long index);
I already have xas_next() and xas_prev() which do the moral equivalent of
p = xa[++i] and p = xa[--i], so having an xas_move() would fit in.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux