On Tue, Feb 25, 2025 at 9:42 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Tue, Feb 25, 2025 at 8:05 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > > > Hi, > > > > On 2/26/2025 11:24 AM, Alexei Starovoitov wrote: > > > On Sat, Feb 8, 2025 at 2:17 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > >> Hi Toke, > > >> > > >> On 2/6/2025 11:05 PM, Toke Høiland-Jørgensen wrote: > > >>> Hou Tao <houtao@xxxxxxxxxxxxxxx> writes: > > >>> > > >>>> +cc Cody Haas > > >>>> > > >>>> Sorry for the resend. I sent the reply in the HTML format. > > >>>> > > >>>> On 2/4/2025 4:28 PM, Hou Tao wrote: > > >>>>> Currently, the update of existing element in hash map involves two > > >>>>> steps: > > >>>>> 1) insert the new element at the head of the hash list > > >>>>> 2) remove the old element > > >>>>> > > >>>>> It is possible that the concurrent lookup operation may fail to find > > >>>>> either the old element or the new element if the lookup operation starts > > >>>>> before the addition and continues after the removal. > > >>>>> > > >>>>> Therefore, replacing the two-step update with an atomic update. After > > >>>>> the change, the update will be atomic in the perspective of the lookup > > >>>>> operation: it will either find the old element or the new element. > > > I'm missing the point. > > > This "atomic" replacement doesn't really solve anything. > > > lookup will see one element. > > > That element could be deleted by another thread. > > > bucket lock and either two step update or single step > > > don't change anything from the pov of bpf prog doing lookup. > > > > The point is that overwriting an existed element may lead to concurrent > > lookups return ENOENT as demonstrated by the added selftest and the > > patch tried to "fix" that. However, it seems using > > hlist_nulls_replace_rcu() for the overwriting update is still not > > enough. Because when the lookup procedure found the old element, the old > > element may be reusing, the comparison of the map key may fail, and the > > lookup procedure may still return ENOENT. > > you mean l_old == l_new ? I don't think it's possible > within htab_map_update_elem(), > but htab_map_delete_elem() doing hlist_nulls_del_rcu() > then free_htab_elem, htab_map_update_elem, alloc, hlist_nulls_add_head_rcu > and just deleted elem is reused to be inserted > into another bucket. > > I'm not sure whether this new hlist_nulls_replace_rcu() > primitive works with nulls logic. > > So back to the problem statement.. > Are you saying that adding new to a head while lookup is in the middle > is causing it to miss an element that > is supposed to be updated assuming atomicity of the update? > While now update_elem() is more like a sequence of delete + insert? > > Hmm. Yes, exactly that. Because update_elem is actually a delete + insert (actually an insert + delete, I think?), it is possible for a concurrent lookup to see no element instead of either the old or new value. > > > I see. In v2 I will fallback to the original idea: adding a standalone > > update procedure for htab of maps in which it will atomically overwrite > > the map_ptr just like array of maps does. > > hold on. is this only for hash-of-maps ? I believe this was also replicated for hash as well as hash-of-maps. Cody can confirm, or use the reproducer he has to test that case. > How that non-atomic update manifested in real production? > See [1] (in the cover letter for this series, also replicated below). [1] : https://lore.kernel.org/xdp-newbies/07a365d8-2e66-2899-4298-b8b158a928fa@xxxxxxxxxxxxxxx/T/#m06fcd687c6cfdbd0f9b643b227e69b479fc8c2f6