Hi, On 10/9/2022 5:05 PM, Paul E. McKenney wrote: > On Sun, Oct 09, 2022 at 09:09:44AM +0800, Hou Tao wrote: >> Hi Paul, >> >> On 10/9/2022 4:11 AM, Paul E. McKenney wrote: >>> On Sat, Oct 08, 2022 at 09:40:04AM -0700, Alexei Starovoitov wrote: >>>> On Sat, Oct 8, 2022 at 6:22 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: >>>>> On Fri, Oct 07, 2022 at 06:59:08PM -0700, Alexei Starovoitov wrote: >> SNIP >>>>>>> Understand. I was just trying to understand the exact performance overhead of >>>>>>> call_rcu(). If the overhead of map operations are much greater than the overhead >>>>>>> of call_rcu(), I think calling call_rcu() one millions a second will be not a >>>>>>> problem and it also makes the implementation of qp-trie being much simpler. The >>>>>>> OOM problem is indeed a problem, although it is also possible for the current >>>>>>> implementation, so I will try to implement the lookup procedure which handles >>>>>>> the reuse problem. >>>>>> call_rcu is not just that particular function. >>>>>> It's all the work rcu subsystem needs to do to observe gp >>>>>> and execute that callback. Just see how many kthreads it will >>>>>> start when overloaded like this. >>>>> The kthreads to watch include rcu_preempt, rcu_sched, ksoftirqd*, rcuc*, >>>>> and rcuo*. There is also the back-of-interrupt softirq context, which >>>>> requires some care to measure accurately. >>>>> >>>>> The possibility of SLAB_TYPESAFE_BY_RCU has been discussed. I take it >>>>> that the per-element locking overhead for exact iterations was a problem? >>>>> If so, what exactly are the consistency rules for iteration? Presumably >>>>> stronger than "if the element existed throughout, it is included in the >>>>> iteration; if it did not exist throughout, it is not included; otherwise >>>>> it might or might not be included" given that you get that for free. >>>>> >>>>> Either way, could you please tell me the exact iteration rules? >>>> The rules are the way we make them to be. >>>> iteration will be under lock. >>>> lookup needs to be correct. It can retry if necessary (like htab is doing). >>>> Randomly returning 'noexist' is, of course, not acceptable. >>> OK, so then it is important that updates to this data structure be >>> carried out in such a way as to avoid discombobulating lockless readers. >>> Do the updates have that property? >> Yes. The update procedure will copy the old pointer array to a new array first, >> then update the new array and replace the pointer of old array by the pointer of >> new array. > Very good. But then why is there a problem? Is the iteration using > multiple RCU read-side critical sections or something? The problem is that although the objects are RCU-freed, but these object also can be reused immediately in bpf memory allocator. The reason for reuse is for performance and is to reduce the possibility of OOM. Because the object can be reused during RCU-protected lookup and the possibility of reuse is low, the lookup procedure needs to check whether reuse is happening during lookup. And I was arguing with Alexei about whether or no it is reasonable to provide an optional flag to remove the immediate reuse in bpf memory allocator. > >>> The usual way to get that property is to leave the old search structure >>> around, replacing it with the new one, and RCU-freeing the old one. >>> In case it helps, Kung and Lehman describe how to do that for search trees: >>> >>> http://www.eecs.harvard.edu/~htk/publication/1980-tods-kung-lehman.pdf >> Thanks for the paper. Just skimming through it, it seems that it uses >> reference-counting and garbage collection to solve the safe memory reclamation >> problem. It may be too heavy for qp-trie and we plan to use seqcount-like way to >> check whether or not the branch and the leaf node is reused during lookup, and >> retry the lookup if it happened. Now just checking the feasibility of the >> solution and it seems a little complicated than expected. > The main thing in that paper is the handling of rotations in the > search-tree update. But if you are not using a tree, that won't be all > that relevant. I see. Thanks for the explanation. > > Thanx, Paul