Re: Question: BPF maps reliability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 3, 2022 at 11:28 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, Nov 2, 2022 at 11:48 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> >
> > Hey everyone,
> >
> > TL;DR Are BPF map operations guaranteed to succeed if the map is
> > configured correctly and accesses to the map do not interrupt each
> > other? Can this be relied on in the future as well?
> >
> > I am looking into migrating some cgroup statistics we internally
> > maintain to use BPF instead of in-kernel code. I am considering
> > several aspects of that, including reliability. With in-kernel code
> > things are really simple, we add the data structures containing the
> > stats to cgroup controller struct, we update them as appropriate, and
> > we export them when needed. With BPF, we need to hook progs to the
> > right locations and store the stats in BPF maps (cgroup local
> > storages, task local storages, hash tables, trees - in the future -)
> > etc.
> >
> > The question I am asking here is about the reliability of such map
> > operations. Looking at the code for lookups and updates for some map
> > types, I can see a lot of failure cases. Looking deeper into them it
> > *seems* to me like in an ideal scenario nothing should fail. By an
> > ideal scenario I mean:
> > - The map size is set correctly,
> > - There is sufficient memory on the system,
> > - We don't use the BPF maps in any progs attached to the BPF maps
> > manipulation code itself,
> > - We don't use the BPF maps in any progs that can interrupt each other
> > (e.g. NMI context).
> >
> > IOW, there are no cases where we fail because two programs running in
> > parallel are trying to access the same map (or map element) or because
> > we couldn't acquire a resource that we don't want to wait on (that
> > wouldn't result in a deadlock)., situations where we might prefer the
> > caller to retry later or where we don't care about one missed
> > operation.
> >
> > Maybe all of this is obvious and I am being paranoid, or maybe there
> > are other obvious failure cases that I missed, or maybe this is just a
> > dumb question, so I apologize in advance if any of this is true :)
>
> It's a correct summary.
> The reliability of map and local storage is certainly required in some cases.

Thanks for taking a look and confirming my thoughts!

> The "new generation" map types with bpf_obj_new and explicit
> map operation will make it easier to audit all the code when
> memory allocation can fail or recursion prevention can kick in.

Makes sense, looking forward to that!



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux