On Wed, Feb 8, 2023 at 12:29 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Tue, Feb 7, 2023 at 7:34 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > On Wed, Feb 8, 2023 at 9:56 AM Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > On Sat, Feb 4, 2023 at 7:56 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > > > On Sat, Feb 4, 2023 at 10:01 AM John Fastabend <john.fastabend@xxxxxxxxx> wrote: > > > > > > > > > > Yafang Shao wrote: > > > > > > Get htab memory usage from the htab pointers we have allocated. Some > > > > > > small pointers are ignored as their size are quite small compared with > > > > > > the total size. > > > > > > > > > > > > The result as follows, > > > > > > - before this change > > > > > > 1: hash name count_map flags 0x0 <<<< prealloc > > > > > > key 16B value 24B max_entries 1048576 memlock 41943040B > > > > > > 2: hash name count_map flags 0x1 <<<< non prealloc, fully set > > > > > > key 16B value 24B max_entries 1048576 memlock 41943040B > > > > > > 3: hash name count_map flags 0x1 <<<< non prealloc, non set > > > > > > key 16B value 24B max_entries 1048576 memlock 41943040B > > > > > > > > > > > > The memlock is always a fixed number whatever it is preallocated or > > > > > > not, and whatever the allocated elements number is. > > > > > > > > > > > > - after this change > > > > > > 1: hash name count_map flags 0x0 <<<< prealloc > > > > > > key 16B value 24B max_entries 1048576 memlock 109064464B > > > > > > 2: hash name count_map flags 0x1 <<<< non prealloc, fully set > > > > > > key 16B value 24B max_entries 1048576 memlock 117464320B > > > > > > 3: hash name count_map flags 0x1 <<<< non prealloc, non set > > > > > > key 16B value 24B max_entries 1048576 memlock 16797952B > > > > > > > > > > > > The memlock now is hashtab actually allocated. > > > > > > > > > > > > At worst, the difference can be 10x, for example, > > > > > > - before this change > > > > > > 4: hash name count_map flags 0x0 > > > > > > key 4B value 4B max_entries 1048576 memlock 8388608B > > > > > > > > > > > > - after this change > > > > > > 4: hash name count_map flags 0x0 > > > > > > key 4B value 4B max_entries 1048576 memlock 83898640B > > > > > > > > > > > > > > > > This walks the entire map and buckets to get the size? Inside a > > > > > rcu critical section as well :/ it seems. > > > > > > > > > > > > > No, it doesn't walk the entire map and buckets, but just gets one > > > > random element. > > > > So it won't be a problem to do it inside a rcu critical section. > > > > > > > > > What am I missing, if you know how many elements are added (which > > > > > you can track on map updates) how come we can't just calculate the > > > > > memory size directly from this? > > > > > > > > > > > > > It is less accurate and hard to understand. Take non-preallocated > > > > percpu hashtab for example, > > > > The size can be calculated as follows, > > > > key_size = round_up(htab->map.key_size, 8); > > > > value_size = round_up(htab->map.value_size, 8); > > > > pcpu_meta_size = sizeof(struct llist_node) + sizeof(void *); > > > > usage = ((value_size * num_possible_cpus() +\ > > > > pcpu_meta_size + key_size) * max_entries > > > > > > > > That is quite unfriendly to the newbies, and may be error-prone. > > > > > > Please do that instead. > > > > I can do it as you suggested, but it seems we shouldn't keep all > > estimates in one place. Because ... > > > > > map_mem_usage callback is a no go as I mentioned earlier. > > > > ...we have to introduce the map_mem_usage callback. Take the lpm_trie > > for example, its usage is > > usage = (sizeof(struct lpm_trie_node) + trie->data_size) * trie->n_entries; > > sizeof(struct lpm_trie_node) + trie->data_size + trie->map.value_size. > Thanks for correcting it. > and it won't count the inner nodes, but _it is ok_. > > > I don't think we want to declare struct lpm_trie_node in kernel/bpf/syscall.c. > > WDYT ? > > Good point. Fine. Let's go with callback, but please keep it > to a single function without loops like htab_non_prealloc_elems_size() > and htab_prealloc_elems_size(). > > Also please implement it for all maps. Sure, I will do it. > Doing it just for hash and arguing that every byte of accuracy matters > while not addressing lpm and other maps doesn't give credibility > to the accuracy argument. -- Regards Yafang