Re: Question: BPF maps reliability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 2, 2022 at 11:48 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> Hey everyone,
>
> TL;DR Are BPF map operations guaranteed to succeed if the map is
> configured correctly and accesses to the map do not interrupt each
> other? Can this be relied on in the future as well?
>
> I am looking into migrating some cgroup statistics we internally
> maintain to use BPF instead of in-kernel code. I am considering
> several aspects of that, including reliability. With in-kernel code
> things are really simple, we add the data structures containing the
> stats to cgroup controller struct, we update them as appropriate, and
> we export them when needed. With BPF, we need to hook progs to the
> right locations and store the stats in BPF maps (cgroup local
> storages, task local storages, hash tables, trees - in the future -)
> etc.
>
> The question I am asking here is about the reliability of such map
> operations. Looking at the code for lookups and updates for some map
> types, I can see a lot of failure cases. Looking deeper into them it
> *seems* to me like in an ideal scenario nothing should fail. By an
> ideal scenario I mean:
> - The map size is set correctly,
> - There is sufficient memory on the system,
> - We don't use the BPF maps in any progs attached to the BPF maps
> manipulation code itself,
> - We don't use the BPF maps in any progs that can interrupt each other
> (e.g. NMI context).
>
> IOW, there are no cases where we fail because two programs running in
> parallel are trying to access the same map (or map element) or because
> we couldn't acquire a resource that we don't want to wait on (that
> wouldn't result in a deadlock)., situations where we might prefer the
> caller to retry later or where we don't care about one missed
> operation.
>
> Maybe all of this is obvious and I am being paranoid, or maybe there
> are other obvious failure cases that I missed, or maybe this is just a
> dumb question, so I apologize in advance if any of this is true :)

It's a correct summary.
The reliability of map and local storage is certainly required in some cases.
The "new generation" map types with bpf_obj_new and explicit
map operation will make it easier to audit all the code when
memory allocation can fail or recursion prevention can kick in.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux