Hi, On 2/27/2025 9:59 AM, Alexei Starovoitov wrote: > On Wed, Feb 26, 2025 at 5:48 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> Hi, >> >> On 2/27/2025 7:17 AM, Zvi Effron wrote: >>> On Tue, Feb 25, 2025 at 9:42 PM Alexei Starovoitov >>> <alexei.starovoitov@xxxxxxxxx> wrote: >>>> On Tue, Feb 25, 2025 at 8:05 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >>>>> Hi, >>>>> >>>>> On 2/26/2025 11:24 AM, Alexei Starovoitov wrote: >>>>>> On Sat, Feb 8, 2025 at 2:17 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >>>>>>> Hi Toke, >>>>>>> >>>>>>> On 2/6/2025 11:05 PM, Toke Høiland-Jørgensen wrote: >>>>>>>> Hou Tao <houtao@xxxxxxxxxxxxxxx> writes: >>>>>>>> >>>>>>>>> +cc Cody Haas >>>>>>>>> >>>>>>>>> Sorry for the resend. I sent the reply in the HTML format. >>>>>>>>> >>>>>>>>> On 2/4/2025 4:28 PM, Hou Tao wrote: >>>>>>>>>> Currently, the update of existing element in hash map involves two >>>>>>>>>> steps: >>>>>>>>>> 1) insert the new element at the head of the hash list >>>>>>>>>> 2) remove the old element >>>>>>>>>> >>>>>>>>>> It is possible that the concurrent lookup operation may fail to find >>>>>>>>>> either the old element or the new element if the lookup operation starts >>>>>>>>>> before the addition and continues after the removal. >>>>>>>>>> >>>>>>>>>> Therefore, replacing the two-step update with an atomic update. After >>>>>>>>>> the change, the update will be atomic in the perspective of the lookup >>>>>>>>>> operation: it will either find the old element or the new element. >>>>>> I'm missing the point. >>>>>> This "atomic" replacement doesn't really solve anything. >>>>>> lookup will see one element. >>>>>> That element could be deleted by another thread. >>>>>> bucket lock and either two step update or single step >>>>>> don't change anything from the pov of bpf prog doing lookup. >>>>> The point is that overwriting an existed element may lead to concurrent >>>>> lookups return ENOENT as demonstrated by the added selftest and the >>>>> patch tried to "fix" that. However, it seems using >>>>> hlist_nulls_replace_rcu() for the overwriting update is still not >>>>> enough. Because when the lookup procedure found the old element, the old >>>>> element may be reusing, the comparison of the map key may fail, and the >>>>> lookup procedure may still return ENOENT. >>>> you mean l_old == l_new ? I don't think it's possible >>>> within htab_map_update_elem(), >>>> but htab_map_delete_elem() doing hlist_nulls_del_rcu() >>>> then free_htab_elem, htab_map_update_elem, alloc, hlist_nulls_add_head_rcu >>>> and just deleted elem is reused to be inserted >>>> into another bucket. >> No. I mean the following procedure in which the lookup procedure finds >> the old element and tries to match its key, one update procedure has >> already deleted the old element, and another update procedure is reusing >> the old element: >> >> lookup procedure A >> A: find the old element (instead of the new old) >> >> update procedure B >> B: delete the old element >> update procedure C on the same CPU: >> C: reuse the old element (overwrite its key and insert in >> the same bucket) >> >> A: the key is mismatched and return -ENOENT. > This is fine. It's just normal reuse. > Orthogonal to 'update as insert+delete' issue. OK. However, it will break the lookup procedure because it expects it will return an valid result instead of -ENOENT. > >> It can be reproduced by increasing ctx.loop in the selftest. >>>> I'm not sure whether this new hlist_nulls_replace_rcu() >>>> primitive works with nulls logic. >>>> >>>> So back to the problem statement.. >>>> Are you saying that adding new to a head while lookup is in the middle >>>> is causing it to miss an element that >>>> is supposed to be updated assuming atomicity of the update? >>>> While now update_elem() is more like a sequence of delete + insert? >>>> >>>> Hmm. >>> Yes, exactly that. Because update_elem is actually a delete + insert (actually >>> an insert + delete, I think?), it is possible for a concurrent lookup to see no >>> element instead of either the old or new value. >> Yep. >>>>> I see. In v2 I will fallback to the original idea: adding a standalone >>>>> update procedure for htab of maps in which it will atomically overwrite >>>>> the map_ptr just like array of maps does. >>>> hold on. is this only for hash-of-maps ? >>> I believe this was also replicated for hash as well as hash-of-maps. Cody can >>> confirm, or use the reproducer he has to test that case. >> The fix for hash-of-maps will be much simpler and the returned map >> pointer will be correct. For other types of hash map, beside switching >> to hlist_nulls_replace_rcu(), > It's been a long time since I looked into rcu_nulls details. > Pls help me understand that this new replace_rcu_nulls() > is correct from nulls pov, > If it is then this patch set may be the right answer to non-atomic update. If I understand correctly, only the manipulations of ->first pointer and ->next pointer need to take care of nulls pointer. > > And for the future, please please focus on "why" part in > the cover letter and commit logs instead of "what". > > Since the only thing I got from the log was: > "Currently, the update is not atomic > because the overwrite of existing element happens in a two-steps way, > but the support of atomic update is feasible". > > "is feasible" doesn't explain "why". > > Link to xdp-newbie question is nice for additional context, > but reviewers should not need to go and read some thread somewhere > to understand "why" part. > All of it should be in the commit log. OK. My original thought is that is a reported problem, so an extra link will be enough. Will try to add more context next time. > >> map may still be incorrect (as shown long time ago [1]), so I think >> maybe for other types of map, the atomic update doesn't matter too much. >> >> [1]: >> https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@xxxxxxxxxxxxxxx/ > A thread from 3 years ago ?! Sorry, it's not helpful to ask > people to page-in such an old context with lots of follow ups > that may or may not be relevant today. Let me reuse part of the diagram above to explain how does the lookup procedure may return a incorrect value: lookup procedure A A: find the old element (instead of the new element) update procedure B B: delete the old element update procedure C on the same CPU: A: the key is matched and return the value in the element C: reuse the old element (overwrite its key and value) A: read the value (it is incorrect, because it has been reused and overwritten)