On Wed, Sep 26, 2018 at 02:44:46PM -0600, Jason Gunthorpe wrote: > On Wed, Sep 26, 2018 at 01:23:41PM -0700, Matthew Wilcox wrote: > > CPU A: > > > > srcu_key = srcu_read_lock(&file->device->disassociate_srcu); > > err = ib_uverbs_cmd_verbs(file, &hdr, user_hdr->attrs); > > -> > > slot = radix_tree_iter_lookup( > > &uapi->radix, &attrs_iter, > > uapi_key_obj(hdr->object_id) | > > uapi_key_ioctl_method(hdr->method_id)); > > > > CPU B: > > radix_tree_for_each_slot (slot, &uapi->radix, &iter, 0) { > > kfree(rcu_dereference_protected(*slot, true)); > > radix_tree_iter_delete(&uapi->radix, &iter, slot); > > } > > > > radix_tree_iter_delete() ends up calling radix_tree_node_free() > > which does a call_rcu(&node->rcu_head, radix_tree_node_rcu_free) > > > You're holding an srcu_read_lock(), not an rcu_read_lock(), so > > as far as RCU is concerned, you're not within a grace period, and > > it doesn't need to wait for you. > > Hmm.... > > This usage of the radix tree is not using RCU. The tree is created at > the start of time and is used read-only until the final kref is put, > then it is destroyed. > > The race above cannot exist because the kref on file->device will not > go to zero while CPU A is running that code above. Specifically the > file's kref is held by a struct file * and the fs core will hold the > kref on struct file during execution of the ioctl file_operation. Ah, OK. That makes sense, but good luck explaining it to lockdep ;-) > I think the mistake here is the misleading use of srcu_dereference(): > > srcu_dereference( > *slot, > &pbundle->bundle.ufile->device->disassociate_srcu); > > That should just be rcu_dereference_protected(*slot, true), I think? aka rcu_dereference_raw()? Yes, that's probably better. > > I'm _really_ not a fan of the way you pull various things out of the > > radix_tree_iter and keep references to them in the pbundle. Can we > > figure out a better way to do this? I don't have a clear picture of > > your requirements at this stage. > > Yes, I figured that would come up with your xarray work. > > All that is needed here is a 'lookup with hint' API. Something like: > > struct iter last_position; > > lookup(0x120, &last_position); > lookup_hint(0x123, &last_position); > lookup_hint(0x124, &last_position); > > Where lookup_hint is fast if last_position already points at the > correct chunk. Otherwise it will search again and update > last_position. > > The last time I looked at the xarray code I thought this would be > straightforward to add as an API, and I was thinking of proposing > something like the above when xarray is finally merged. I can work with that. I'll send a suggested patch tomorrow. Probably something like void *xas_move(struct xa_state *, unsigned long index); I already have xas_next() and xas_prev() which do the moral equivalent of p = xa[++i] and p = xa[--i], so having an xas_move() would fit in.