Re: uverbs radix tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 26, 2018 at 04:02:20PM -0600, Jason Gunthorpe wrote:
> On Wed, Sep 26, 2018 at 02:08:15PM -0700, Matthew Wilcox wrote:
> > On Wed, Sep 26, 2018 at 02:44:46PM -0600, Jason Gunthorpe wrote:
> > > This usage of the radix tree is not using RCU. The tree is created at
> > > the start of time and is used read-only until the final kref is put,
> > > then it is destroyed.
> > > 
> > > The race above cannot exist because the kref on file->device will not
> > > go to zero while CPU A is running that code above. Specifically the
> > > file's kref is held by a struct file * and the fs core will hold the
> > > kref on struct file during execution of the ioctl file_operation.
> > 
> > Ah, OK.  That makes sense, but good luck explaining it to lockdep ;-)
> 
> I haven't noticed anything weird with lockdep, where do you see it
> will have trouble?

What I mean is, you're defeating lockdep by passing "true" as the second
argument.  It can't validate that your assumptions are correct.  I suppose
instead of "true", you could write "kref > 0" or whatever.

> > > I think the mistake here is the misleading use of srcu_dereference():
> > > 
> > >    srcu_dereference(
> > > 		*slot,
> > > 		&pbundle->bundle.ufile->device->disassociate_srcu);
> > > 
> > > That should just be rcu_dereference_protected(*slot, true), I think?
> > 
> > aka rcu_dereference_raw()?  Yes, that's probably better.
> 
> I've been following the docs in Documentation/RCU/lockdep.txt:
> 
>         rcu_dereference_raw(p):
>                 Don't check.  (Use sparingly, if at all.)
>         rcu_dereference_protected(p, c):
>                 Use explicit check expression "c", and omit all barriers
>                 and compiler constraints.  This is useful when the data
>                 structure cannot change, for example, in code that is
>                 invoked only by updaters.
> 
> In this case 'the data structure cannot change' as we are protected
> from that by kref.
> 
> Do you know if the docs are wrong now and _raw() is now the right
> choice?

This is further down the rcu rabbithole than I'm comfortable with.
I see _raw as "I know what I'm doing" ... I don't think anyone foresaw
people using 'true' as the condition to _protected.  I think they're
equivalent and '_protected(x, true)' is just a more verbose way of
spelling '_raw(x)'.

> > > > I'm _really_ not a fan of the way you pull various things out of the
> > > > radix_tree_iter and keep references to them in the pbundle.  Can we
> > > > figure out a better way to do this?  I don't have a clear picture of
> > > > your requirements at this stage.
> > > 
> > > Yes, I figured that would come up with your xarray work.
> > > 
> > > All that is needed here is a 'lookup with hint' API. Something like:
> > > 
> > >  struct iter last_position;
> > > 
> > >  lookup(0x120, &last_position);
> > >  lookup_hint(0x123, &last_position);
> > >  lookup_hint(0x124, &last_position);
> > > 
> > > Where lookup_hint is fast if last_position already points at the
> > > correct chunk. Otherwise it will search again and update
> > > last_position.
> > > 
> > > The last time I looked at the xarray code I thought this would be
> > > straightforward to add as an API, and I was thinking of proposing
> > > something like the above when xarray is finally merged.
> > 
> > I can work with that.  I'll send a suggested patch tomorrow.
> > Probably something like void *xas_move(struct xa_state *, unsigned
> > long index); I already have xas_next() and xas_prev() which do the
> > moral equivalent of p = xa[++i] and p = xa[--i], so having an
> > xas_move() would fit in.
> 
> So we could write it as:
> 
>  XA_STATE(iter, radix, 0x120);
> 
>  value = xas_load(&iter);
>  value = xas_move(0x124, &iter); /* ie + 4 */
>  value = xas_move(0x123, &iter); /* ie - 1 */
> 
> ?
> 
> I guess it is worth pointing out the only reason it is like this is
> performance on a system call path. The radix tree version is one
> compare and an add in the typical case, it would be nice to stay
> within a similar envelope.
> 
> ie update of the state's offset and index on every lookup is not
> something that is really needed for this use case.
> 
> Also, currently the info is in the 'pbundle' thing.. Does this need
> some rework to keep the XA_STATE separated on the stack? Or something
> else?
> 
> Are you looking at this because you plan to convert this code in the
> next merge window? Do you have a way we can merge that conversion
> through the RDMA git tree? I am willing to help with that.. Otherwise
> I fear we will have monster conflicts for Linus.
> 
> I was expecting to do the conversion in the cycle after xarray
> lands...

I wasn't expecting current users of the radix tree to help ;-)  And I was
getting a head start on some of the harder conversions so I could send
them to maintainers early.  But yes, my plan is to send the conversions
to maintainers for merging after the base XArray code lands, not try and
mass-convert radix trees to xarrays through my own tree.

Meanwhile I've come up with another problem which is that the plan is to
make xarray nodes movable.  That means that *even if the xarray node is
pinned* by your own code, the xarray may decide to relocate the entries
if you're not holding the RCU read lock or the xarray lock.

Do you really need the hint?  For a three level raidx tree, that's only
6 additional cachelines.  Prmtr optmstn s th rt f ll vl.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux