On Thu, Jul 06, 2023 at 08:21:17PM +0800, Hou Tao wrote: > Hi, > > On 7/6/2023 7:54 PM, Anton Protopopov wrote: > > On Thu, Jul 06, 2023 at 06:49:02PM +0800, Hou Tao wrote: > >> Hi, > >> > >> On 7/6/2023 12:01 AM, Anton Protopopov wrote: > >>> Add a new map test, map_percpu_stats.c, which is checking the correctness of > >>> map's percpu elements counters. For supported maps the test upserts a number > >>> of elements, checks the correctness of the counters, then deletes all the > >>> elements and checks again that the counters sum drops down to zero. > >>> > >>> The following map types are tested: > >>> > >>> * BPF_MAP_TYPE_HASH, BPF_F_NO_PREALLOC > >>> * BPF_MAP_TYPE_PERCPU_HASH, BPF_F_NO_PREALLOC > >>> * BPF_MAP_TYPE_HASH, > >>> * BPF_MAP_TYPE_PERCPU_HASH, > >>> * BPF_MAP_TYPE_LRU_HASH > >>> * BPF_MAP_TYPE_LRU_PERCPU_HASH > >>> * BPF_MAP_TYPE_LRU_HASH, BPF_F_NO_COMMON_LRU > >>> * BPF_MAP_TYPE_LRU_PERCPU_HASH, BPF_F_NO_COMMON_LRU > >>> * BPF_MAP_TYPE_HASH_OF_MAPS > >>> > >>> Signed-off-by: Anton Protopopov <aspsk@xxxxxxxxxxxxx> > >> Acked-by: Hou Tao <houtao1@xxxxxxxxxx> > >> > >> With two nits below. > > Thanks, fixed both for v5. > Great. > SNIP > > +static void delete_and_lookup_batch(int map_fd, void *keys, __u32 count) > > +{ > > + static __u8 values[(8 << 10) * MAX_ENTRIES]; > > + void *in_batch = NULL, *out_batch; > > + __u32 save_count = count; > > + int ret; > > + > > + ret = bpf_map_lookup_and_delete_batch(map_fd, > > + &in_batch, &out_batch, > > + keys, values, &count, > > + NULL); > > + > > + /* > > + * Despite what uapi header says, lookup_and_delete_batch will return > > + * -ENOENT in case we successfully have deleted all elements, so check > > + * this separately > > + */ > >> It seems it is a bug in __htab_map_lookup_and_delete_batch(). I could > >> post a patch to fix it if you don't plan to do that by yourself. > > This should be as simple as > > > > @@ -1876,7 +1876,8 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, > > total += bucket_cnt; > > batch++; > > if (batch >= htab->n_buckets) { > > - ret = -ENOENT; > > + if (!total) > > + ret = -ENOENT; > > goto after_loop; > > } > > goto again; > > No. I think changing it to "if (max_count > total) ret = -ENOENT;" will > be more appropriate, because it means the requested count couldn't been > fulfilled and it is also consistent with the comments in > include/uapi/linux/bpf.h Say, I have a map of size N and I don't know how many entries there are. Then I will do count=N lookup_and_delete(&count) In this case we will walk through the whole map, reach the 'batch >= htab->n_buckets', and set the count to the number of elements we read. (If, in opposite, there's no space to read a whole bucket, then we check this above and return -ENOSPC.) > > However, this might be already utilized by some apps to check that they've read > > all entries. Two local examples are map_tests/map_in_map_batch_ops.c and > > map_tests/htab_map_batch_ops.c. Another example I know is from BCC tools: > > https://github.com/iovisor/bcc/blob/master/libbpf-tools/map_helpers.c#L58 > I think these use cases will be fine. Because when the last element has > been successfully iterated and returned, the out_batch is also updated, > so if the batch op is called again, -ENOENT will be returned. > > > > Can we update comments in include/uapi/linux/bpf.h? > I think the comments are correct. Currently we return -ENOENT as an indicator that (a) 'in_batch' is out of bounds (b) we reached the end of map. So technically, this is an optimization, as if we read elements in a loop by passing 'in_batch', 'out_batch', even if we return 0 in case (b), the next syscall would return -ENOENT, because the new 'in_batch' would point to out of bounds. This also makes sense for a map which is empty: we reached the end of map, didn't find any elements, so we're returning -ENOENT (in contrast with saying "all is ok, we read 0 elements"). So from my point of view -ENOENT makes sense. However, comments say "Returns zero on success" which doesn't look true to me as I think that reading the whole map in one syscall is a success :)