Re: [RFC PATCH bpf-next v3 1/2] bpf, x64: Fix tailcall infinite loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2023/8/31 21:12, Leon Hwang wrote:
> 
> 
> On 2023/8/31 06:49, Maciej Fijalkowski wrote:
>> On Sat, Aug 26, 2023 at 12:03:12PM +0800, Leon Hwang wrote:
>>>
>>>
>>> On 2023/8/26 01:58, Maciej Fijalkowski wrote:
>>>> On Fri, Aug 25, 2023 at 10:52:15PM +0800, Leon Hwang wrote:
>>>>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
>>>>> handling in JIT"), the tailcall on x64 works better than before.
>>>>>
>>>>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
>>>>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>>>>>
>>>>> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
>>>>> to other BPF programs"), BPF program is able to trace other BPF programs.
>>>>>
>>>>> How about combining them all together?
>>>>>
>>>>> 1. FENTRY/FEXIT on a BPF subprogram.
>>>>> 2. A tailcall runs in the BPF subprogram.
>>>>> 3. The tailcall calls itself.
>>>>
>>>> I would be interested in seeing broken asm code TBH :)
>>>>
>>>>>
>>>>> As a result, a tailcall infinite loop comes up. And the loop would halt
>>>>> the machine.
>>>>>
>>>>> As we know, in tail call context, the tail_call_cnt propagates by stack
>>>>> and rax register between BPF subprograms. So do it in trampolines.
>>>>>
>>>>> Signed-off-by: Leon Hwang <hffilwlqm@xxxxxxxxx>
>>>>> ---
>>>>>  arch/x86/net/bpf_jit_comp.c | 32 ++++++++++++++++++++++++++------
>>>>>  include/linux/bpf.h         |  5 +++++
>>>>>  kernel/bpf/trampoline.c     |  4 ++--
>>>>>  kernel/bpf/verifier.c       | 30 +++++++++++++++++++++++-------
>>>>>  4 files changed, 56 insertions(+), 15 deletions(-)
>>>>>
> 
> [SNIP]
> 
>>>>>  
>>>>> +	if (tgt_prog && tgt_prog->aux->tail_call_reachable) {
>>>>> +		subprog = find_subprog_index(tgt_prog, btf_id);
>>>>> +		tr->flags = subprog > 0 ? BPF_TRAMP_F_TAIL_CALL_CTX : 0;
>>>>> +	}
>>>>
>>>> I kinda forgot trampoline internals so please bear with me.
>>>> Here you are checking actually...what? That current program is a subprog
>>>> of tgt prog? My knee jerk reaction would be to propagate the
>>>> BPF_TRAMP_F_TAIL_CALL_CTX based on just tail_call_reachable, but I need
>>>> some more time to get my head around it again, sorry :<
>>>
>>> Yeah, that current program must be a subprog of tgt prog.
>>>
>>> For example:
>>>
>>> tailcall_subprog() {
>>>   bpf_tail_call_static(&jmp_table, 0);
>>> }
>>>
>>> tailcall_prog() {
>>>   tailcall_subprog();
>>> }
>>>
>>> prog() {
>>>   bpf_tail_call_static(&jmp_table, 0);
>>> }
>>>
>>> jmp_table populates with tailcall_prog().
>>>
>>> When do fentry on prog(), there's no tail_call_cnt for fentry to
>>> propagate. As we can see in emit_prologue(), fentry runs before
>>> initialising tail_call_cnt.
>>>
>>> When do fentry on tailcall_prog()? NO, it's impossible to do fentry on
>>> tailcall_prog(). Because the tailcall 'jmp' skips the fentry on
>>> tailcall_prog().
>>>
>>> And, when do fentry on tailcall_subprog(), fentry has to propagate
>>> tail_call_cnt properly.
>>>
>>> In conclusion, that current program must be a subprog of tgt prog.
>>
>> Verifier propagates the info about tail call usage through whole call
>> chain on a given prog so this doesn't really matter to me where do we
>> attach fentry progs. All I'm saying is:
>>
>> 	if (tgt_prog && tgt_prog->aux->tail_call_reachable)
>> 		tr->flags = BPF_TRAMP_F_TAIL_CALL_CTX;
>>
>> should be just fine. I might be missing something but with above your
>> selftest does not hang my system.
> 
> I think it's unnecessary to propagate tail call usage info when do
> fentry on prog(), which is the entry of the whole tail call context. If
> do propagate in this case, it's meaningless to execute two extra
> instructions.

Because it's harmless, I agree with you. I'll change it to

 	if (tgt_prog && tgt_prog->aux->tail_call_reachable)
 		tr->flags = BPF_TRAMP_F_TAIL_CALL_CTX;

With this update, it's easier to understand BPF_TRAMP_F_TAIL_CALL_CTX.

> 
> I confirm that the above selftest is able to hang VM. I copy test_progs
> along with tailcall*.bpf.o to another VM, which is Ubuntu 22.04.3 with
> kernel 5.15.0-82-generic, then run ./test_progs -t tailcalls. And then
> the VM hangs.
> 
> Here's the Ubuntu 22.04.3 VM info:
> # uname -a
> Linux hwang 5.15.0-82-generic #91-Ubuntu SMP Mon Aug 14 14:14:14 UTC
> 2023 x86_64 x86_64 x86_64 GNU/Linux

What I suggest here is to run the selftest in the second patch, not the
above example.

Thanks,
Leon




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux