Re: [PATCH v2 bpf-next 3/5] ftrace: introduce FTRACE_OPS_FL_SHARE_IPMODIFY

Song Liu <songliubraving@xxxxxx> · Fri, 15 Jul 2022 19:49:00 +0000

> On Jul 15, 2022, at 12:12 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> 
> On Fri, 15 Jul 2022 17:42:55 +0000
> Song Liu <songliubraving@xxxxxx> wrote:
> 
> 
>> A quick update and ask for feedback/clarification.
>> 
>> Based on my understanding, you recommended calling ops_func() from 
>> __ftrace_hash_update_ipmodify() and in ops_func() the direct trampoline
>> may make changes to the trampoline. Did I get this right?
>> 
>> 
>> I am going towards this direction, but hit some issue. Specifically, in 
>> __ftrace_hash_update_ipmodify(), ftrace_lock is already locked, so the 
>> direct trampoline cannot easily make changes with 
>> modify_ftrace_direct_multi(), which locks both direct_mutex and 
>> ftrace_mutex. 
>> 
>> One solution would be have no-lock version of all the functions called
>> by modify_ftrace_direct_multi(), but that's a lot of functions and the
>> code will be pretty ugly. The alternative would be the logic in v2: 
>> __ftrace_hash_update_ipmodify() returns -EAGAIN, and we make changes to 
>> the direct trampoline in other places: 
>> 
>> 1) if DIRECT ops attached first, the trampoline is updated in 
>>   prepare_direct_functions_for_ipmodify(), see 3/5 of v2;
>> 
>> 2) if IPMODIFY ops attached first, the trampoline is updated in
>>   bpf_trampoline_update(), see "goto again" path in 5/5 of v2. 
>> 
>> Overall, I think this way is still cleaner. What do you think about this?
> 
> What about if we release the lock when doing the callback?

We can probably unlock ftrace_lock here. But we may break locking order 
with direct mutex (see below).

> 
> Then we just need to make sure things are the same after reacquiring the
> lock, and if they are different, we release the lock again and do the
> callback with the new update. Wash, rinse, repeat, until the state is the
> same before and after the callback with locks acquired?

Personally, I would like to avoid wash-rinse-repeat here.

> 
> This is a common way to handle callbacks that need to do something that
> takes the lock held before doing a callback.
> 
> The reason I say this, is because the more we can keep the accounting
> inside of ftrace the better.
> 
> Wouldn't this need to be done anyway if BPF was first and live kernel
> patching needed the update? An -EAGAIN would not suffice.

prepare_direct_functions_for_ipmodify handles BPF-first-livepatch-later
case. The benefit of prepare_direct_functions_for_ipmodify() is that it 
holds direct_mutex before ftrace_lock, and keeps holding it if necessary. 
This is enough to make sure we don't need the wash-rinse-repeat. 

OTOH, if we wait until __ftrace_hash_update_ipmodify(), we already hold
ftrace_lock, but not direct_mutex. To make changes to bpf trampoline, we
have to unlock ftrace_lock and lock direct_mutex to avoid deadlock. 
However, this means we will need the wash-rinse-repeat. 

For livepatch-first-BPF-later case, we can probably handle this in 
__ftrace_hash_update_ipmodify(), since we hold both direct_mutex and 
ftrace_lock. We can unlock ftrace_lock and update the BPF trampoline. 
It is safe against changes to direct ops, because we are still holding 
direct_mutex. But, is this safe against another IPMODIFY ops? I am not 
sure yet... Also, this is pretty weird because, we are updating a 
direct trampoline before we finish registering it for the first time. 
IOW, we are calling modify_ftrace_direct_multi_nolock for the same 
trampoline before register_ftrace_direct_multi() returns.

The approach in v2 propagates the -EAGAIN to BPF side, so these are two
independent calls of register_ftrace_direct_multi(). This does require
some protocol between ftrace core and its user, but I still think this 
is a cleaner approach. 

Does this make sense?

Thanks,
Song