Re: [PATCH bpf-next v2 00/13] Add support for qp-trie with dynptr key

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Sun, 9 Oct 2022 04:04:26 -0700

On Sun, Oct 09, 2022 at 06:45:22PM +0800, Hou Tao wrote:
> Hi,
> On 10/9/2022 5:05 PM, Paul E. McKenney wrote:
> > On Sun, Oct 09, 2022 at 09:09:44AM +0800, Hou Tao wrote:
> >> Hi Paul,
> >>
> >> On 10/9/2022 4:11 AM, Paul E. McKenney wrote:
> >>> On Sat, Oct 08, 2022 at 09:40:04AM -0700, Alexei Starovoitov wrote:
> >>>> On Sat, Oct 8, 2022 at 6:22 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >>>>> On Fri, Oct 07, 2022 at 06:59:08PM -0700, Alexei Starovoitov wrote:
> >> SNIP
> >>>>>>> Understand. I was just trying to understand the exact performance overhead of
> >>>>>>> call_rcu(). If the overhead of map operations are much greater than the overhead
> >>>>>>> of call_rcu(), I think calling call_rcu() one millions a second will be not a
> >>>>>>> problem and  it also makes the implementation of qp-trie being much simpler. The
> >>>>>>> OOM problem is indeed a problem, although it is also possible for the current
> >>>>>>> implementation, so I will try to implement the lookup procedure which handles
> >>>>>>> the reuse problem.
> >>>>>> call_rcu is not just that particular function.
> >>>>>> It's all the work rcu subsystem needs to do to observe gp
> >>>>>> and execute that callback. Just see how many kthreads it will
> >>>>>> start when overloaded like this.
> >>>>> The kthreads to watch include rcu_preempt, rcu_sched, ksoftirqd*, rcuc*,
> >>>>> and rcuo*.  There is also the back-of-interrupt softirq context, which
> >>>>> requires some care to measure accurately.
> >>>>>
> >>>>> The possibility of SLAB_TYPESAFE_BY_RCU has been discussed.  I take it
> >>>>> that the per-element locking overhead for exact iterations was a problem?
> >>>>> If so, what exactly are the consistency rules for iteration?  Presumably
> >>>>> stronger than "if the element existed throughout, it is included in the
> >>>>> iteration; if it did not exist throughout, it is not included; otherwise
> >>>>> it might or might not be included" given that you get that for free.
> >>>>>
> >>>>> Either way, could you please tell me the exact iteration rules?
> >>>> The rules are the way we make them to be.
> >>>> iteration will be under lock.
> >>>> lookup needs to be correct. It can retry if necessary (like htab is doing).
> >>>> Randomly returning 'noexist' is, of course, not acceptable.
> >>> OK, so then it is important that updates to this data structure be
> >>> carried out in such a way as to avoid discombobulating lockless readers.
> >>> Do the updates have that property?
> >> Yes. The update procedure will copy the old pointer array to a new array first,
> >> then update the new array and replace the pointer of old array by the pointer of
> >> new array.
> > Very good.  But then why is there a problem?  Is the iteration using
> > multiple RCU read-side critical sections or something?
> 
> The problem is that although the objects are RCU-freed, but these object also
> can be reused immediately in bpf memory allocator. The reason for reuse is for
> performance and is to reduce the possibility of OOM. Because the object can be
> reused during RCU-protected lookup and the possibility of reuse is low, the
> lookup procedure needs to check whether reuse is happening during lookup. And I
> was arguing with Alexei about whether or no it is reasonable to provide an
> optional flag to remove the immediate reuse in bpf memory allocator.

Indeed, in that case there needs to be a check, for example as described
in the comment preceding the definition of SLAB_TYPESAFE_BY_RCU.

If the use of the element is read-only on the one hand or
heuristic/statistical on the other, lighter weight approaches are
possible.  Would that help?

							Thanx, Paul

> >>> The usual way to get that property is to leave the old search structure
> >>> around, replacing it with the new one, and RCU-freeing the old one.
> >>> In case it helps, Kung and Lehman describe how to do that for search trees:
> >>>
> >>> http://www.eecs.harvard.edu/~htk/publication/1980-tods-kung-lehman.pdf
> >> Thanks for the paper. Just skimming through it, it seems that it uses
> >> reference-counting and garbage collection to solve the safe memory reclamation
> >> problem. It may be too heavy for qp-trie and we plan to use seqcount-like way to
> >> check whether or not the branch and the leaf node is reused during lookup, and
> >> retry the lookup if it happened. Now just checking the feasibility of the
> >> solution and it seems a little complicated than expected.
> > The main thing in that paper is the handling of rotations in the
> > search-tree update.  But if you are not using a tree, that won't be all
> > that relevant.
> I see. Thanks for the explanation.
> >
> > 								Thanx, Paul
>