Re: uverbs radix tree

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Sat, 29 Sep 2018 06:34:54 -0700

On Fri, Sep 28, 2018 at 04:19:11PM -0600, Jason Gunthorpe wrote:
> On Fri, Sep 28, 2018 at 02:24:40PM -0700, Matthew Wilcox wrote:
> They are not the same..
> 
> _raw has barriers, while _protected does not.
> 
> ie _raw is the same as rcu_dereference except that it doesn't lockdep
> check the caller is in a RCU critical region.
> 
> It looks like _protected is supposed to be used if the caller has
> excluded writers (ie this case), and since we are excluding by control
> flow not locking the 'true' is appropriate. 'git grep' shows it is
> pretty common to pass true here, maybe even the most common usage..

Yes, there do seem to be a lot of callers with 'true' as the second
parameter.

> > I wasn't expecting current users of the radix tree to help ;-)  And I was
> > getting a head start on some of the harder conversions so I could send
> > them to maintainers early.  But yes, my plan is to send the conversions
> > to maintainers for merging after the base XArray code lands, not try and
> > mass-convert radix trees to xarrays through my own tree.
> 
> Sure, let me know, I can help you out with rdma.

I appreciate it!  And I appreciate the discussion here ...

> > Meanwhile I've come up with another problem which is that the plan is to
> > make xarray nodes movable.  That means that *even if the xarray node is
> > pinned* by your own code, the xarray may decide to relocate the entries
> > if you're not holding the RCU read lock or the xarray lock.
> 
> This is Christoph's work?

Yes, Christoph and I have been working on it for almost two years
... while getting distracted with other projects.

> Ah, I see, xarray has to hold a RCU critical section all over the
> place and radix tree did not.. Hum, I think uverbs is OK with that
> except for this one case.

Right, radix tree said "the locking is up to the callers" where the XArray
defines what its locking is.

> > Do you really need the hint?  For a three level raidx tree, that's only
> > 6 additional cachelines.  Prmtr optmstn s th rt f ll vl.
> 
> Sounds like it is some SMP barriers now that xarray requires RCU as
> well as some function calls and atomics? Looped 10-20 times per system
> call :( already hot cache lines seem like the least worry here.

Umm, I'm not sure I see the atomics in the xa_load path.

void *xa_load(struct xarray *xa, unsigned long index)
{
        XA_STATE(xas, xa, index);
        void *entry;

        rcu_read_lock();
        do {
                entry = xas_load(&xas);
        } while (xas_retry(&xas, entry));
        rcu_read_unlock();

        return entry;
}

Other than debugging code, there's no atomics in rcu_read_lock.

void *xas_load(struct xa_state *xas)
{
        void *entry = xas_start(xas);

        while (xa_is_node(entry)) {
                struct xa_node *node = xa_to_node(entry);

                if (xas->xa_shift > node->shift)
                        break;
                entry = xas_descend(xas, node);
        }
        return entry;
}

The only thing that looks even remotely barrier or atomic in here is
xa_entry where it calls:

        return rcu_dereference_check(node->slots[offset],
                                                lockdep_is_held(&xa->xa_lock));

which is only going to insert barriers on Alpha.  This is better than
what the radix tree does which uses rcu_dereference_raw all over the
place with an occasional rcu_dereference().

> The system call path is one of the things we consider a fast path in
> the subsystem so care has been taken to keep things out of the
> way. This code got a nice 6% bump when this was moved from the
> open-coded tree scheme to radix tree, in part because we get this nice
> fast 3 level lookup, compared to the deeper and very branch-heavy
> version that was before.
> 
> However this code is still being renovated and we haven't yet put the
> performance critical stuff into ioctl. So if the xarray can't handle
> this then efficiently please just replace with a full lookup.
> 
> We can benchmark again and if it is bad we can can burn some memory to
> pre-compute equivalent linear lookup tables under uverbs's control so
> the usage don't trouble xarray.

That might be better ... just use the xarray to look up the base of
your own private data structure and put a pointer to that into your
bundle_priv.