Re: [PATCH v4 bpf-next 5/6] selftests/bpf: test map percpu stats

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Fri, 7 Jul 2023 09:41:03 +0800

Hi,

On 7/6/2023 8:57 PM, Anton Protopopov wrote:
> On Thu, Jul 06, 2023 at 08:21:17PM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 7/6/2023 7:54 PM, Anton Protopopov wrote:
>>
SNIP
>>> +static void delete_and_lookup_batch(int map_fd, void *keys, __u32 count)
>>> +{
>>> +	static __u8 values[(8 << 10) * MAX_ENTRIES];
>>> +	void *in_batch = NULL, *out_batch;
>>> +	__u32 save_count = count;
>>> +	int ret;
>>> +
>>> +	ret = bpf_map_lookup_and_delete_batch(map_fd,
>>> +					      &in_batch, &out_batch,
>>> +					      keys, values, &count,
>>> +					      NULL);
>>> +
>>> +	/*
>>> +	 * Despite what uapi header says, lookup_and_delete_batch will return
>>> +	 * -ENOENT in case we successfully have deleted all elements, so check
>>> +	 * this separately
>>> +	 */
>>>> It seems it is a bug in __htab_map_lookup_and_delete_batch(). I could
>>>> post a patch to fix it if you don't plan to do that by yourself.
>>> This should be as simple as
>>>
>>> @@ -1876,7 +1876,8 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>>>         total += bucket_cnt;
>>>         batch++;
>>>         if (batch >= htab->n_buckets) {
>>> -               ret = -ENOENT;
>>> +               if (!total)
>>> +                       ret = -ENOENT;
>>>                 goto after_loop;
>>>         }
>>>         goto again;
>> No. I think changing it to "if (max_count > total) ret = -ENOENT;" will
>> be more appropriate, because it means the requested count couldn't been
>> fulfilled and it is also consistent with the comments in 
>> include/uapi/linux/bpf.h
> Say, I have a map of size N and I don't know how many entries there are.
> Then I will do
>
>     count=N
>     lookup_and_delete(&count)
>
> In this case we will walk through the whole map, reach the 'batch >=
> htab->n_buckets', and set the count to the number of elements we read.
>
> (If, in opposite, there's no space to read a whole bucket, then we check this
> above and return -ENOSPC.)
>
>>> However, this might be already utilized by some apps to check that they've read
>>> all entries. Two local examples are map_tests/map_in_map_batch_ops.c and
>>> map_tests/htab_map_batch_ops.c. Another example I know is from BCC tools:
>>> https://github.com/iovisor/bcc/blob/master/libbpf-tools/map_helpers.c#L58
>> I think these use cases will be fine. Because when the last element has
>> been successfully iterated and returned, the out_batch is also updated,
>> so if the batch op is called again, -ENOENT will be returned.
>>> Can we update comments in include/uapi/linux/bpf.h?
>> I think the comments are correct.
> Currently we return -ENOENT as an indicator that (a) 'in_batch' is out of
> bounds (b) we reached the end of map. So technically, this is an optimization,
> as if we read elements in a loop by passing 'in_batch', 'out_batch', even if we
> return 0 in case (b), the next syscall would return -ENOENT, because the new
> 'in_batch' would point to out of bounds.
>
> This also makes sense for a map which is empty: we reached the end of map,
> didn't find any elements, so we're returning -ENOENT (in contrast with saying
> "all is ok, we read 0 elements").
>
> So from my point of view -ENOENT makes sense. However, comments say "Returns
> zero on success" which doesn't look true to me as I think that reading the
> whole map in one syscall is a success :)

I get your point. The current implementation of BPF_MAP_LOOKUP_BATCH
does the following two things:
1) returns 0 when the whole map has not been iterated but there is no
space for current bucket.
2) doesn't return 0 when the whole map has been iterated successfully
(and the requested count is fulfilled)

For 1) I prefer to update the comments in uapi. If instead we fix the
implementation, we may break the existed users which need to check
ENOSPC to continue the batch op.
For 2) I don't have a preference. Both updating the comments and
implementation are fine to me.

WDYT ?