Re: [PATCH bpf-next 7/7] bpf: hashtab memory usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 7, 2023 at 7:34 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Wed, Feb 8, 2023 at 9:56 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Sat, Feb 4, 2023 at 7:56 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > >
> > > On Sat, Feb 4, 2023 at 10:01 AM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
> > > >
> > > > Yafang Shao wrote:
> > > > > Get htab memory usage from the htab pointers we have allocated. Some
> > > > > small pointers are ignored as their size are quite small compared with
> > > > > the total size.
> > > > >
> > > > > The result as follows,
> > > > > - before this change
> > > > > 1: hash  name count_map  flags 0x0  <<<< prealloc
> > > > >         key 16B  value 24B  max_entries 1048576  memlock 41943040B
> > > > > 2: hash  name count_map  flags 0x1  <<<< non prealloc, fully set
> > > > >         key 16B  value 24B  max_entries 1048576  memlock 41943040B
> > > > > 3: hash  name count_map  flags 0x1  <<<< non prealloc, non set
> > > > >         key 16B  value 24B  max_entries 1048576  memlock 41943040B
> > > > >
> > > > > The memlock is always a fixed number whatever it is preallocated or
> > > > > not, and whatever the allocated elements number is.
> > > > >
> > > > > - after this change
> > > > > 1: hash  name count_map  flags 0x0  <<<< prealloc
> > > > >         key 16B  value 24B  max_entries 1048576  memlock 109064464B
> > > > > 2: hash  name count_map  flags 0x1  <<<< non prealloc, fully set
> > > > >         key 16B  value 24B  max_entries 1048576  memlock 117464320B
> > > > > 3: hash  name count_map  flags 0x1  <<<< non prealloc, non set
> > > > >         key 16B  value 24B  max_entries 1048576  memlock 16797952B
> > > > >
> > > > > The memlock now is hashtab actually allocated.
> > > > >
> > > > > At worst, the difference can be 10x, for example,
> > > > > - before this change
> > > > > 4: hash  name count_map  flags 0x0
> > > > >         key 4B  value 4B  max_entries 1048576  memlock 8388608B
> > > > >
> > > > > - after this change
> > > > > 4: hash  name count_map  flags 0x0
> > > > >         key 4B  value 4B  max_entries 1048576  memlock 83898640B
> > > > >
> > > >
> > > > This walks the entire map and buckets to get the size? Inside a
> > > > rcu critical section as well :/ it seems.
> > > >
> > >
> > > No, it doesn't walk the entire map and buckets, but just gets one
> > > random element.
> > > So it won't be a problem to do it inside a rcu critical section.
> > >
> > > > What am I missing, if you know how many elements are added (which
> > > > you can track on map updates) how come we can't just calculate the
> > > > memory size directly from this?
> > > >
> > >
> > > It is less accurate and hard to understand. Take non-preallocated
> > > percpu hashtab for example,
> > > The size can be calculated as follows,
> > >     key_size = round_up(htab->map.key_size, 8);
> > >     value_size = round_up(htab->map.value_size, 8);
> > >     pcpu_meta_size = sizeof(struct llist_node) + sizeof(void *);
> > >     usage = ((value_size * num_possible_cpus() +\
> > >                     pcpu_meta_size + key_size) * max_entries
> > >
> > > That is quite unfriendly to the newbies, and may be error-prone.
> >
> > Please do that instead.
>
> I can do it as you suggested, but it seems we shouldn't keep all
> estimates in one place. Because ...
>
> > map_mem_usage callback is a no go as I mentioned earlier.
>
> ...we have to introduce the map_mem_usage callback. Take the lpm_trie
> for example, its usage is
> usage = (sizeof(struct lpm_trie_node) + trie->data_size) * trie->n_entries;

sizeof(struct lpm_trie_node) + trie->data_size + trie->map.value_size.

and it won't count the inner nodes, but _it is ok_.

> I don't think we want  to declare struct lpm_trie_node in kernel/bpf/syscall.c.
> WDYT ?

Good point. Fine. Let's go with callback, but please keep it
to a single function without loops like htab_non_prealloc_elems_size()
and htab_prealloc_elems_size().

Also please implement it for all maps.
Doing it just for hash and arguing that every byte of accuracy matters
while not addressing lpm and other maps doesn't give credibility
to the accuracy argument.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux