Re: [PATCH bpf-next 7/7] bpf: hashtab memory usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 8, 2023 at 12:29 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, Feb 7, 2023 at 7:34 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> > On Wed, Feb 8, 2023 at 9:56 AM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Sat, Feb 4, 2023 at 7:56 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > > >
> > > > On Sat, Feb 4, 2023 at 10:01 AM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
> > > > >
> > > > > Yafang Shao wrote:
> > > > > > Get htab memory usage from the htab pointers we have allocated. Some
> > > > > > small pointers are ignored as their size are quite small compared with
> > > > > > the total size.
> > > > > >
> > > > > > The result as follows,
> > > > > > - before this change
> > > > > > 1: hash  name count_map  flags 0x0  <<<< prealloc
> > > > > >         key 16B  value 24B  max_entries 1048576  memlock 41943040B
> > > > > > 2: hash  name count_map  flags 0x1  <<<< non prealloc, fully set
> > > > > >         key 16B  value 24B  max_entries 1048576  memlock 41943040B
> > > > > > 3: hash  name count_map  flags 0x1  <<<< non prealloc, non set
> > > > > >         key 16B  value 24B  max_entries 1048576  memlock 41943040B
> > > > > >
> > > > > > The memlock is always a fixed number whatever it is preallocated or
> > > > > > not, and whatever the allocated elements number is.
> > > > > >
> > > > > > - after this change
> > > > > > 1: hash  name count_map  flags 0x0  <<<< prealloc
> > > > > >         key 16B  value 24B  max_entries 1048576  memlock 109064464B
> > > > > > 2: hash  name count_map  flags 0x1  <<<< non prealloc, fully set
> > > > > >         key 16B  value 24B  max_entries 1048576  memlock 117464320B
> > > > > > 3: hash  name count_map  flags 0x1  <<<< non prealloc, non set
> > > > > >         key 16B  value 24B  max_entries 1048576  memlock 16797952B
> > > > > >
> > > > > > The memlock now is hashtab actually allocated.
> > > > > >
> > > > > > At worst, the difference can be 10x, for example,
> > > > > > - before this change
> > > > > > 4: hash  name count_map  flags 0x0
> > > > > >         key 4B  value 4B  max_entries 1048576  memlock 8388608B
> > > > > >
> > > > > > - after this change
> > > > > > 4: hash  name count_map  flags 0x0
> > > > > >         key 4B  value 4B  max_entries 1048576  memlock 83898640B
> > > > > >
> > > > >
> > > > > This walks the entire map and buckets to get the size? Inside a
> > > > > rcu critical section as well :/ it seems.
> > > > >
> > > >
> > > > No, it doesn't walk the entire map and buckets, but just gets one
> > > > random element.
> > > > So it won't be a problem to do it inside a rcu critical section.
> > > >
> > > > > What am I missing, if you know how many elements are added (which
> > > > > you can track on map updates) how come we can't just calculate the
> > > > > memory size directly from this?
> > > > >
> > > >
> > > > It is less accurate and hard to understand. Take non-preallocated
> > > > percpu hashtab for example,
> > > > The size can be calculated as follows,
> > > >     key_size = round_up(htab->map.key_size, 8);
> > > >     value_size = round_up(htab->map.value_size, 8);
> > > >     pcpu_meta_size = sizeof(struct llist_node) + sizeof(void *);
> > > >     usage = ((value_size * num_possible_cpus() +\
> > > >                     pcpu_meta_size + key_size) * max_entries
> > > >
> > > > That is quite unfriendly to the newbies, and may be error-prone.
> > >
> > > Please do that instead.
> >
> > I can do it as you suggested, but it seems we shouldn't keep all
> > estimates in one place. Because ...
> >
> > > map_mem_usage callback is a no go as I mentioned earlier.
> >
> > ...we have to introduce the map_mem_usage callback. Take the lpm_trie
> > for example, its usage is
> > usage = (sizeof(struct lpm_trie_node) + trie->data_size) * trie->n_entries;
>
> sizeof(struct lpm_trie_node) + trie->data_size + trie->map.value_size.
>

Thanks for correcting it.

> and it won't count the inner nodes, but _it is ok_.
>
> > I don't think we want  to declare struct lpm_trie_node in kernel/bpf/syscall.c.
> > WDYT ?
>
> Good point. Fine. Let's go with callback, but please keep it
> to a single function without loops like htab_non_prealloc_elems_size()
> and htab_prealloc_elems_size().
>
> Also please implement it for all maps.

Sure, I will do it.

> Doing it just for hash and arguing that every byte of accuracy matters
> while not addressing lpm and other maps doesn't give credibility
> to the accuracy argument.



-- 
Regards
Yafang




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux