Re: [PATCH v2 bpf-next 5/5] bpf: trampoline: support FTRACE_OPS_FL_SHARE_IPMODIFY

Song Liu <songliubraving@xxxxxx> · Thu, 7 Jul 2022 00:19:07 +0000

> On Jul 6, 2022, at 3:29 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> 
> On Wed, 6 Jul 2022 22:15:47 +0000
> Song Liu <songliubraving@xxxxxx> wrote:
> 
>>> On Jul 6, 2022, at 2:40 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>>> 
>>> On Wed, 6 Jul 2022 21:37:52 +0000
>>> Song Liu <songliubraving@xxxxxx> wrote:
>>> 
>>>>> Can you comment here that returning -EAGAIN will not cause this to repeat.
>>>>> That it will change things where the next try will not return -EGAIN?    
>>>> 
>>>> Hmm.. this is not the guarantee here. This conflict is a real race condition 
>>>> that an IPMODIFY function (i.e. livepatch) is being registered at the same time 
>>>> when something else, for example bpftrace, is updating the BPF trampoline. 
>>>> 
>>>> This EAGAIN will propagate to the user of the IPMODIFY function (i.e. livepatch),
>>>> and we need to retry there. In the case of livepatch, the retry is initiated 
>>>> from user space.   
>>> 
>>> We need to be careful here then. If there's a userspace application that
>>> runs at real-time and does a:
>>> 
>>> 	do {
>>> 		errno = 0;
>>> 		regsiter_bpf();
>>> 	} while (errno != -EAGAIN);  
>> 
>> Actually, do you mean:
>> 
>> 	do {
>> 		errno = 0;
>> 		regsiter_bpf();
>> 	} while (errno == -EAGAIN);
>> 
>> (== -EAGAIN) here?
> 
> Yeah, of course.
> 
>> 
>> In this specific race condition, register_bpf() will succeed, as it already
>> got tr->mutex. But the IPMODIFY (livepatch) side will fail and retry. 
> 
> What else takes the tr->mutex ?

tr->mutex is the local mutex for a single BPF trampoline, we only need to take
it when we make changes to the trampoline (add/remove fentry/fexit programs). 

> 
> If it preempts anything else taking that mutex, when this runs, then it
> needs to be careful.
> 
> You said this can happen when the live patch came first. This isn't racing
> against live patch, it's racing against anything that takes the tr->mutex
> and then adds a bpf trampoline to a location that has a live patch.

There are a few scenarios here:
1. Live patch is already applied, then a BPF trampoline is being registered 
to the same function. In bpf_trampoline_update(), register_fentry returns
-EAGAIN, and this will be resolved. 

2. BPF trampoline is already registered, then a live patch is being applied 
to the same function. The live patch code need to ask the bpf trampoline to
prepare the trampoline before live patch. This is done by calling 
bpf_tramp_ftrace_ops_func. 

2.1 If nothing else is modifying the trampoline at the same time, 
bpf_tramp_ftrace_ops_func will succeed. 

2.2 In rare cases, if something else is holding tr->mutex to make changes to 
the trampoline (add/remove fentry functions, etc.), mutex_trylock in 
bpf_tramp_ftrace_ops_func will fail, and live patch will fail. However, the 
change to BPF trampoline will still succeed. It is common for live patch to
retry, so we just need to try live patch again when no one is making changes 
to the BPF trampoline in parallel. 

> 
>> 
>> Since both livepatch and bpf trampoline changes are rare operations, I think 
>> the chance of the race condition is low enough. 
>> 
>> Does this make sense?
>> 
> 
> It's low, and if it is also a privileged operation then there's less to be
> concern about.

Both live patch and BPF trampoline are privileged operations. 

> As if it is not, then we could have a way to deadlock the
> system. I'm more concerned that this will lead to a CVE than it just
> happening randomly. In other words, it only takes something that can run at
> a real-time priority to connect to a live patch location, and something
> that runs at a low priority to take a tr->mutex. If an attacker has both,
> then it can pin both to a CPU and then cause the deadlock to the system.
> 
> One hack to fix this is to add a msleep(1) in the failed case of the
> trylock. This will at least give the owner of the lock a millisecond to
> release it. This was what the RT patch use to do with spin_trylock() that
> was converted to a mutex (and we worked hard to remove all of them).

The fix is really simple. But I still think we don't need it. We only hit
the trylock case for something with IPMODIFY. The non-privileged user 
should not be able to do that, right?

Thanks,
Song