Re: [PATCH] BPF: Disable on PREEMPT_RT

Clark Williams <williams@xxxxxxxxxx> · Thu, 17 Oct 2019 21:49:17 -0500

+acme

On Thu, 17 Oct 2019 23:54:07 +0200 (CEST)
Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Thu, 17 Oct 2019, David Miller wrote:
> 
> > From: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> > Date: Thu, 17 Oct 2019 17:40:21 +0200
> >   
> > > On 2019-10-17 16:53:58 [+0200], Daniel Borkmann wrote:  
> > >> On Thu, Oct 17, 2019 at 11:05:01AM +0200, Sebastian Andrzej Siewior wrote:  
> > >> > Disable BPF on PREEMPT_RT because
> > >> > - it allocates and frees memory in atomic context
> > >> > - it uses up_read_non_owner()
> > >> > - BPF_PROG_RUN() expects to be invoked in non-preemptible context  
> > >> 
> > >> For the latter you'd also need to disable seccomp-BPF and everything
> > >> cBPF related as they are /all/ invoked via BPF_PROG_RUN() ...  
> > > 
> > > I looked at tracing and it depended on BPF_SYSCALL so I assumed they all
> > > do… Now looking for BPF_PROG_RUN() there is PPP_FILTER,
> > > NET_TEAM_MODE_LOADBALANCE and probably more.  I didn't find a symbol for
> > > seccomp-BPF. 
> > > Would it make sense to override BPF_PROG_RUN() and make each caller fail
> > > instead? Other recommendations?  
> > 
> > I hope you understand that basically you are disabling any packet sniffing
> > on the system with this patch you are proposing.
> > 
> > This means no tcpdump, not wireshark, etc.  They will all become
> > non-functional.
> > 
> > Turning off BPF just because PREEMPT_RT is enabled is a non-starter it is
> > absolutely essential functionality for a Linux system at this point.  
> 
> I'm all ears for an alternative solution. Here are the pain points:
> 
>   #1) BPF disables preemption unconditionally with no way to do a proper RT
>       substitution like most other infrastructure in the kernel provides
>       via spinlocks or other locking primitives.

As I understand it, BPF programs cannot loop and are limited to 4096 instructions.
Has anyone done any timing to see just how much having preemption off while a
BPF program executes is going to affect us? Are we talking 1us or 50us? or longer?
I wonder if there's some instrumentation we could use to determine the maximum time
spent running a BPF program. Maybe some perf mojo...

> 
>   #2) BPF does allocations in atomic contexts, which is a dubious decision
>       even for non RT. That's related to #1

I guess my question here is, are the allocations done on behalf of an about-to-run
BPF program, or as a result of executing BPF code?  Is it something we might be able
to satisfy from a pre-allocated pool rather than kmalloc()? Ok, I need to go dive
into BPF a bit deeper.

> 
>   #3) BPF uses the up_read_non_owner() hackery which was only invented to
>       deal with already existing horrors and not meant to be proliferated.
> 
>       Yes, I know it's a existing facility ....

I'm sure I'll regret asking this, but why is up_read_non_owner() a horror? I mean,
I get the fundamental wrongness of having someone that's not the owner of a semaphore
performing an 'up' on it, but is there an RT-specific reason that it's bad? Is it
totally a blocker for using BPF with RT or is it something we should fix over time?

> 
> TBH, I have no idea how to deal with those things. So the only way forward
> for RT right now is to disable the whole thing.
> 
> Clark might have some insight from the product side for you how much that
> impacts usability.
> 
> Thanks,
> 
> 	tglx

Clark is only just starting his journey with BPF, so not an expert.

I do think that we (RT) are going to have to co-exist with BPF, if only due to the
increased use of XDP. I also think that other sub-systems will start to
employ BPF for production purposes (as opposed to debug/analysis which is
how we generally look at tracing, packet sniffing, etc.). I think we *have* to
figure out how to co-exist. 

Guess my "hey, that look interesting, think I'll leisurely read up on it" just got
a little less leisurely. I'm out most of the day tomorrow but I'll catch up on email
over the weekend.

Clark

-- 
The United States Coast Guard
Ruining Natural Selection since 1790