2019-08-14 13:18 UTC-0700 ~ Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> > On Wed, Aug 14, 2019 at 10:12 AM Quentin Monnet > <quentin.monnet@xxxxxxxxxxxxx> wrote: >> >> 2019-08-14 09:58 UTC-0700 ~ Alexei Starovoitov >> <alexei.starovoitov@xxxxxxxxx> >>> On Wed, Aug 14, 2019 at 9:45 AM Edward Cree <ecree@xxxxxxxxxxxxxx> wrote: >>>> >>>> On 14/08/2019 10:42, Quentin Monnet wrote: >>>>> 2019-08-13 18:51 UTC-0700 ~ Alexei Starovoitov >>>>> <alexei.starovoitov@xxxxxxxxx> >>>>>> The same can be achieved by 'bpftool map dump|grep key|wc -l', no? >>>>> To some extent (with subtleties for some other map types); and we use a >>>>> similar command line as a workaround for now. But because of the rate of >>>>> inserts/deletes in the map, the process often reports a number higher >>>>> than the max number of entries (we observed up to ~750k when max_entries >>>>> is 500k), even is the map is only half-full on average during the count. >>>>> On the worst case (though not frequent), an entry is deleted just before >>>>> we get the next key from it, and iteration starts all over again. This >>>>> is not reliable to determine how much space is left in the map. >>>>> >>>>> I cannot see a solution that would provide a more accurate count from >>>>> user space, when the map is under pressure? >>>> This might be a really dumb suggestion, but: you're wanting to collect a >>>> summary statistic over an in-kernel data structure in a single syscall, >>>> because making a series of syscalls to examine every entry is slow and >>>> racy. Isn't that exactly a job for an in-kernel virtual machine, and >>>> could you not supply an eBPF program which the kernel runs on each entry >>>> in the map, thus supporting people who want to calculate something else >>>> (mean, min and max, whatever) instead of count? >>> >>> Pretty much my suggestion as well :) > > I also support the suggestion to count it from BPF side. It's flexible > and powerful approach and doesn't require adding more and more nuanced > sub-APIs to kernel to support subset of bulk operations on map > (subset, because we'll expose count, but what about, e.g., p50, etc, > there will always be something more that someone will want and it just > doesn't scale). Hi Andrii, Yes, that makes sense. > >>> >>> It seems the better fix for your nat threshold is to keep count of >>> elements in the map in a separate global variable that >>> bpf program manually increments and decrements. >>> bpftool will dump it just as regular map of single element. >>> (I believe it doesn't recognize global variables properly yet) >>> and BTF will be there to pick exactly that 'count' variable. >>> >> >> It would be with an offloaded map, but yes, I suppose we could keep >> track of the numbers in a separate map. We'll have a look into this. > > See if you can use a global variable, that way you completely > eliminate any overhead from BPF side of things, except for atomic > increment. Offloaded maps do not implement the map_direct_value_addr() operation, so global variables are not supported at the moment. I need to dive deeper into this and see what is required to add that support. Thanks for your advice! Quentin