Re: [PATCH v4 bpf-next 5/6] selftests/bpf: test map percpu stats

Anton Protopopov <aspsk@xxxxxxxxxxxxx> · Thu, 6 Jul 2023 12:57:16 +0000

On Thu, Jul 06, 2023 at 08:21:17PM +0800, Hou Tao wrote:
> Hi,
> 
> On 7/6/2023 7:54 PM, Anton Protopopov wrote:
> > On Thu, Jul 06, 2023 at 06:49:02PM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 7/6/2023 12:01 AM, Anton Protopopov wrote:
> >>> Add a new map test, map_percpu_stats.c, which is checking the correctness of
> >>> map's percpu elements counters.  For supported maps the test upserts a number
> >>> of elements, checks the correctness of the counters, then deletes all the
> >>> elements and checks again that the counters sum drops down to zero.
> >>>
> >>> The following map types are tested:
> >>>
> >>>     * BPF_MAP_TYPE_HASH, BPF_F_NO_PREALLOC
> >>>     * BPF_MAP_TYPE_PERCPU_HASH, BPF_F_NO_PREALLOC
> >>>     * BPF_MAP_TYPE_HASH,
> >>>     * BPF_MAP_TYPE_PERCPU_HASH,
> >>>     * BPF_MAP_TYPE_LRU_HASH
> >>>     * BPF_MAP_TYPE_LRU_PERCPU_HASH
> >>>     * BPF_MAP_TYPE_LRU_HASH, BPF_F_NO_COMMON_LRU
> >>>     * BPF_MAP_TYPE_LRU_PERCPU_HASH, BPF_F_NO_COMMON_LRU
> >>>     * BPF_MAP_TYPE_HASH_OF_MAPS
> >>>
> >>> Signed-off-by: Anton Protopopov <aspsk@xxxxxxxxxxxxx>
> >> Acked-by: Hou Tao <houtao1@xxxxxxxxxx>
> >>
> >> With two nits below.
> > Thanks, fixed both for v5.
> Great.
> SNIP
> > +static void delete_and_lookup_batch(int map_fd, void *keys, __u32 count)
> > +{
> > +	static __u8 values[(8 << 10) * MAX_ENTRIES];
> > +	void *in_batch = NULL, *out_batch;
> > +	__u32 save_count = count;
> > +	int ret;
> > +
> > +	ret = bpf_map_lookup_and_delete_batch(map_fd,
> > +					      &in_batch, &out_batch,
> > +					      keys, values, &count,
> > +					      NULL);
> > +
> > +	/*
> > +	 * Despite what uapi header says, lookup_and_delete_batch will return
> > +	 * -ENOENT in case we successfully have deleted all elements, so check
> > +	 * this separately
> > +	 */
> >> It seems it is a bug in __htab_map_lookup_and_delete_batch(). I could
> >> post a patch to fix it if you don't plan to do that by yourself.
> > This should be as simple as
> >
> > @@ -1876,7 +1876,8 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >         total += bucket_cnt;
> >         batch++;
> >         if (batch >= htab->n_buckets) {
> > -               ret = -ENOENT;
> > +               if (!total)
> > +                       ret = -ENOENT;
> >                 goto after_loop;
> >         }
> >         goto again;
> 
> No. I think changing it to "if (max_count > total) ret = -ENOENT;" will
> be more appropriate, because it means the requested count couldn't been
> fulfilled and it is also consistent with the comments in 
> include/uapi/linux/bpf.h

Say, I have a map of size N and I don't know how many entries there are.
Then I will do

    count=N
    lookup_and_delete(&count)

In this case we will walk through the whole map, reach the 'batch >=
htab->n_buckets', and set the count to the number of elements we read.

(If, in opposite, there's no space to read a whole bucket, then we check this
above and return -ENOSPC.)

> > However, this might be already utilized by some apps to check that they've read
> > all entries. Two local examples are map_tests/map_in_map_batch_ops.c and
> > map_tests/htab_map_batch_ops.c. Another example I know is from BCC tools:
> > https://github.com/iovisor/bcc/blob/master/libbpf-tools/map_helpers.c#L58
> I think these use cases will be fine. Because when the last element has
> been successfully iterated and returned, the out_batch is also updated,
> so if the batch op is called again, -ENOENT will be returned.
> >
> > Can we update comments in include/uapi/linux/bpf.h?
> I think the comments are correct.

Currently we return -ENOENT as an indicator that (a) 'in_batch' is out of
bounds (b) we reached the end of map. So technically, this is an optimization,
as if we read elements in a loop by passing 'in_batch', 'out_batch', even if we
return 0 in case (b), the next syscall would return -ENOENT, because the new
'in_batch' would point to out of bounds.

This also makes sense for a map which is empty: we reached the end of map,
didn't find any elements, so we're returning -ENOENT (in contrast with saying
"all is ok, we read 0 elements").

So from my point of view -ENOENT makes sense. However, comments say "Returns
zero on success" which doesn't look true to me as I think that reading the
whole map in one syscall is a success :)