Re: [RESEND] [PATCH bpf-next 2/3] bpf: Overwrite the element in hash map atomically

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Thu, 27 Feb 2025 10:43:07 +0800



Hi,

On 2/27/2025 9:59 AM, Alexei Starovoitov wrote:
> On Wed, Feb 26, 2025 at 5:48 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> On 2/27/2025 7:17 AM, Zvi Effron wrote:
>>> On Tue, Feb 25, 2025 at 9:42 PM Alexei Starovoitov
>>> <alexei.starovoitov@xxxxxxxxx> wrote:
>>>> On Tue, Feb 25, 2025 at 8:05 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> On 2/26/2025 11:24 AM, Alexei Starovoitov wrote:
>>>>>> On Sat, Feb 8, 2025 at 2:17 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>>>>>> Hi Toke,
>>>>>>>
>>>>>>> On 2/6/2025 11:05 PM, Toke Høiland-Jørgensen wrote:
>>>>>>>> Hou Tao <houtao@xxxxxxxxxxxxxxx> writes:
>>>>>>>>
>>>>>>>>> +cc Cody Haas
>>>>>>>>>
>>>>>>>>> Sorry for the resend. I sent the reply in the HTML format.
>>>>>>>>>
>>>>>>>>> On 2/4/2025 4:28 PM, Hou Tao wrote:
>>>>>>>>>> Currently, the update of existing element in hash map involves two
>>>>>>>>>> steps:
>>>>>>>>>> 1) insert the new element at the head of the hash list
>>>>>>>>>> 2) remove the old element
>>>>>>>>>>
>>>>>>>>>> It is possible that the concurrent lookup operation may fail to find
>>>>>>>>>> either the old element or the new element if the lookup operation starts
>>>>>>>>>> before the addition and continues after the removal.
>>>>>>>>>>
>>>>>>>>>> Therefore, replacing the two-step update with an atomic update. After
>>>>>>>>>> the change, the update will be atomic in the perspective of the lookup
>>>>>>>>>> operation: it will either find the old element or the new element.
>>>>>> I'm missing the point.
>>>>>> This "atomic" replacement doesn't really solve anything.
>>>>>> lookup will see one element.
>>>>>> That element could be deleted by another thread.
>>>>>> bucket lock and either two step update or single step
>>>>>> don't change anything from the pov of bpf prog doing lookup.
>>>>> The point is that overwriting an existed element may lead to concurrent
>>>>> lookups return ENOENT as demonstrated by the added selftest and the
>>>>> patch tried to "fix" that. However, it seems using
>>>>> hlist_nulls_replace_rcu() for the overwriting update is still not
>>>>> enough. Because when the lookup procedure found the old element, the old
>>>>> element may be reusing, the comparison of the map key may fail, and the
>>>>> lookup procedure may still return ENOENT.
>>>> you mean l_old == l_new ? I don't think it's possible
>>>> within htab_map_update_elem(),
>>>> but htab_map_delete_elem() doing hlist_nulls_del_rcu()
>>>> then free_htab_elem, htab_map_update_elem, alloc, hlist_nulls_add_head_rcu
>>>> and just deleted elem is reused to be inserted
>>>> into another bucket.
>> No. I mean the following procedure in which the lookup procedure finds
>> the old element and tries to match its key, one update procedure has
>> already deleted the old element, and another update procedure is reusing
>> the old element:
>>
>> lookup procedure A
>> A: find the old element (instead of the new old)
>>
>>               update procedure B
>>               B: delete the old element
>>               update procedure C on the same CPU:
>>               C: reuse the old element (overwrite its key and insert in
>> the same bucket)
>>
>> A: the key is mismatched and return -ENOENT.
> This is fine. It's just normal reuse.
> Orthogonal to 'update as insert+delete' issue.

OK. However, it will break the lookup procedure because it expects it
will return an valid result instead of -ENOENT.
>
>> It can be reproduced by increasing ctx.loop in the selftest.
>>>> I'm not sure whether this new hlist_nulls_replace_rcu()
>>>> primitive works with nulls logic.
>>>>
>>>> So back to the problem statement..
>>>> Are you saying that adding new to a head while lookup is in the middle
>>>> is causing it to miss an element that
>>>> is supposed to be updated assuming atomicity of the update?
>>>> While now update_elem() is more like a sequence of delete + insert?
>>>>
>>>> Hmm.
>>> Yes, exactly that. Because update_elem is actually a delete + insert (actually
>>> an insert + delete, I think?), it is possible for a concurrent lookup to see no
>>> element instead of either the old or new value.
>> Yep.
>>>>> I see. In v2 I will fallback to the original idea: adding a standalone
>>>>> update procedure for htab of maps in which it will atomically overwrite
>>>>> the map_ptr just like array of maps does.
>>>> hold on. is this only for hash-of-maps ?
>>> I believe this was also replicated for hash as well as hash-of-maps. Cody can
>>> confirm, or use the reproducer he has to test that case.
>> The fix for hash-of-maps will be much simpler and the returned map
>> pointer will be correct. For other types of hash map, beside switching
>> to hlist_nulls_replace_rcu(),
> It's been a long time since I looked into rcu_nulls details.
> Pls help me understand that this new replace_rcu_nulls()
> is correct from nulls pov,
> If it is then this patch set may be the right answer to non-atomic update.

If I understand correctly, only the manipulations of ->first pointer and
->next pointer need to take care of nulls pointer.
>
> And for the future, please please focus on "why" part in
> the cover letter and commit logs instead of "what".
>
> Since the only thing I got from the log was:
> "Currently, the update is not atomic
> because the overwrite of existing element happens in a two-steps way,
> but the support of atomic update is feasible".
>
> "is feasible" doesn't explain "why".
>
> Link to xdp-newbie question is nice for additional context,
> but reviewers should not need to go and read some thread somewhere
> to understand "why" part.
> All of it should be in the commit log.

OK. My original thought is that is a reported problem, so an extra link
will be enough. Will try to add more context next time.
>
>> map may still be incorrect (as shown long time ago [1]), so I think
>> maybe for other types of map, the atomic update doesn't matter too much.
>>
>> [1]:
>> https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@xxxxxxxxxxxxxxx/
> A thread from 3 years ago ?! Sorry, it's not helpful to ask
> people to page-in such an old context with lots of follow ups
> that may or may not be relevant today.
Let me reuse part of the diagram above to explain how does the lookup
procedure may return a incorrect value:

lookup procedure A
A: find the old element (instead of the new element)


              update procedure B
              B: delete the old element
              update procedure C on the same CPU:


A: the key is matched and return the value in the element

              C: reuse the old element (overwrite its key and value)

A: read the value (it is incorrect, because it has been reused and
overwritten)