Re: [PATCH bpf] bpf: skip non existing key in generic_map_lookup_batch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 6, 2025 at 2:20 AM Yan Zhai <yan@xxxxxxxxxxxxxx> wrote:
>
> On Thu, Feb 6, 2025 at 12:40 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> >
> >
> >
> > Yes, the retry logic doesn't change the previous key. Thanks for the
> > clarifying.
> > > And by "skipping to the next key", it's simply
> > >
> > >   if (err == -ENOENT)
> > >        goto next_key;
> > >
> > > Note the "next_key" label was not in the current codebase. It is only
> > > in my posted patch. I don't think this would break lpm_trie unless I
> > > missed something.
> >
> > I was trying to say that the proposed patch may break the lookup_batch
> > operation for lpm_trie, and let me explain step by step:
> >
> > For LPM trie map:
> > (1) ->map_get_next_key(map, prev_key, key) returns a valid key
> >
> > (2) bpf_map_copy_value() return -ENOMENT
> > It means the key must be deleted concurrently.
> >
>
> I see what you mean now, thanks for the detailed explanation!
>
> So for lpm_trie, if an element is deleted between get_next_key and
> copy_value, then retry would still proceed to its next tree node. But
> with or without retry, I think we cannot prevent the key from cycling
> back to the beginning: an element can be deleted after copy_value, so
> the key becomes invalid. After swap with prev_key and call
> bpf_get_next_key, it cycles back to the leftmost. Similar can happen
> if during retry the prev_key also gets invalidated. IIUC the rcu
> locking in lookup really cannot prevent the tree structure from
> changes.
>
> On the other hand, bpf_map_get_next_key manual already specifies "If a
> key is not found, the operation returns zero and sets the next_key
> pointer to the key of the first element.". IMHO it probably makes more
> sense that, regardless of normal or batch lookup, it is ultimately the
> user's responsibility to properly synchronize, or deduplicate what is
> returned from the kernel. I intend to keep the "skip to next" to save
> the unnecessary complexity here, unless there are strong objections
> that I should not.

Thought about this again. It is interesting that hashmap batch lookup
actually guarantees no duplicate in output today. If we want to
achieve the same for lpm_trie, it should have its own batch lookup
routine, which could properly lock the trie against mutation. For the
generic batch, it is still better to be simple. Let me put this in the
V2 patchset for discussion.

best
Yan

>
> > (3) goto next_key
> > It swaps the prev_key and key
> >
> > (4) ->map_get_next_key(map, prev_key, key) again
> > prev_key points to a non-existing key, for LPM trie it will treat just
> > like prev_key=NULL case, the returned key will be duplicated.
> >





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux