Re: BPF memory model

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 07, 2023 at 03:00:56PM -0700, Josh Don wrote:
> Hi Paul,
> 
> I was chatting with Dave Marchevsky about the BPF memory model, and
> had some followup questions you might be able to answer.
> 
> I've been using the built-in RMW operations to do a lot of lockless
> programming, for concurrent BPF-BPF, but also especially for
> userspace-BPF (the latter of which has become a lot more interesting
> with the sched_ext work from Meta). It would of course be nice to
> sometimes lower the synchronization overhead to a hardware barrier or
> a compiler barrier, to allow for general use acquire/release semantics
> (rather than needing to fall back to a lock RMW instruction). I saw
> your presentation from 2021 on this topic here:
> https://lpc.events/event/11/contributions/941/attachments/859/1667/bpf-memory-model.2020.09.22a.pdf
> 
> Has there been any further interest in supporting additional
> kernel-style atomics in BPF that you know of?

This is one of the first that I have heard of.  ;-)

But what BPF programs are you running that are seeing excessive
synchronization overhead?  That will tell us which operations to
start with.  (Or maybe it is time to just add the full Linux-kernel
atomic-operations kitchen sink, but that would not normally be the way
to bet.)

> And on a different BPF note, one thing I wasn't sure about was the
> ability of the cpu to reorder loads and stores across the BPF program
> call boundary. For example, could the load of "z" in the BPF program
> below be reordered before the store to x in the kernel? I'm sure that
> no compiler barrier is ever necessary here since the BPF program is
> compiled separately from the kernel, but I'm not sure whether a
> hardware barrier is necessary.
> <kernel>
> x = 3
> call_bpf();
>   <bpf>
>   int y = z;

Given that a major goal of BPF is the ability to add low-overhead
programs to code on fastpaths, I would not expect any implicit barriers
in that case.  Consider for example counting the number of calls to a
"hot" function in the Linux kernel, in which case adding full ordering
would incur unacceptable performance degradation.  I would instead
expect that the BPF program would need to add explicit barriers or
ordered RMW operations.

But people will not be shy about correcting me if I am confused on
either of these points!

							Thanx, Paul




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux