On Sun, Aug 1, 2021 at 1:38 AM Johan Almbladh <johan.almbladh@xxxxxxxxxxxxxxxxx> wrote: > > On Fri, Jul 30, 2021 at 12:48 AM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Thu, Jul 29, 2021 at 3:29 PM Andrii Nakryiko > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > On Thu, Jul 29, 2021 at 2:38 PM Johan Almbladh > > > <johan.almbladh@xxxxxxxxxxxxxxxxx> wrote: > > > > > > > > On Wed, Jul 28, 2021 at 9:13 PM Yonghong Song <yhs@xxxxxx> wrote: > > > > > I also checked arm/arm64 jit. I saw the following comments: > > > > > > > > > > /* if (tail_call_cnt > MAX_TAIL_CALL_CNT) > > > > > * goto out; > > > > > * tail_call_cnt++; > > > > > */ > > > > > > > > > > Maybe we have this MAX_TAIL_CALL_CNT + 1 issue > > > > > for arm/arm64 jit? > > > > > > > > That wouldn't be unreasonable. I don't have an arm or arm64 setup > > > > available right now, but I can try to test it in qemu. > > > > > > On a brief check, there seems to be quite a mess in terms of the code > > > and comments. > > > > > > E.g., in arch/x86/net/bpf_jit_comp32.c: > > > > > > /* > > > * if (tail_call_cnt > MAX_TAIL_CALL_CNT) > > > * goto out; > > > */ > > > > > > ^^^^ here comment is wrong > > > > > > [...] > > > > > > /* cmp edx,hi */ > > > EMIT3(0x83, add_1reg(0xF8, IA32_EBX), hi); > > > EMIT2(IA32_JNE, 3); > > > /* cmp ecx,lo */ > > > EMIT3(0x83, add_1reg(0xF8, IA32_ECX), lo); > > > > > > /* ja out */ > > > EMIT2(IA32_JAE, jmp_label(jmp_label1, 2)); > > > > > > ^^^ JAE is >=, right? But the comment says JA. > > > > > > > > > As for arch/x86/net/bpf_jit_comp.c, both comment and the code seem to > > > do > MAX_TAIL_CALL_CNT, but you are saying JIT is correct. What am I > > > missing? > > > > > > Can you please check all the places where MAX_TAIL_CALL_CNT is used > > > throughout the code? Let's clean this up in one go. > > > > > > Also, given it's so easy to do this off-by-one error, can you please > > > add a negative test validating that 33 tail calls are not allowed? I > > > assume we have a positive test that allows exactly MAX_TAIL_CALL_CNT, > > > but please double-check that as well. > > > > Ok, I see that you've added this in your bpf tests patch set. Please > > consider, additionally, implementing a similar test as part of > > selftests/bpf (specifically in test_progs). We run test_progs > > continuously in CI for every incoming patch/patchset, so it has much > > higher chances of capturing any regressions. > > > > I'm also thinking that this MAX_TAIL_CALL_CNT change should probably > > go into the bpf-next tree. First, this off-by-one behavior was around > > for a while and it doesn't cause serious issues, even if abused. But > > on the other hand, it will make your tail call tests fail, when > > applied into bpf-next without your change. So I think we should apply > > both into bpf-next. > > I can confirm that the off-by-one behaviour is present on arm. Below > is the test output running on qemu. Test #4 calls itself recursively > and increments a counter each time, so the correct result should be 1 > + MAX_TAIL_CALL_CNT. > > test_bpf: #0 Tail call leaf jited:1 71 PASS > test_bpf: #1 Tail call 2 jited:1 134 PASS > test_bpf: #2 Tail call 3 jited:1 164 PASS > test_bpf: #3 Tail call 4 jited:1 257 PASS > test_bpf: #4 Tail call error path, max count reached jited:1 ret 34 != 33 FAIL > test_bpf: #5 Tail call error path, NULL target jited:1 114 PASS > test_bpf: #6 Tail call error path, index out of range jited:1 112 PASS > test_bpf: test_tail_calls: Summary: 6 PASSED, 1 FAILED, [7/7 JIT'ed] > > The MAX_TAIL_CALL_CNT constant is referenced in the following JITs. > > arch/arm64/net/bpf_jit_comp.c > arch/arm/net/bpf_jit_32.c > arch/mips/net/ebpf_jit.c > arch/powerpc/net/bpf_jit_comp32.c > arch/powerpc/net/bpf_jit_comp64.c > arch/riscv/net/bpf_jit_comp32.c > arch/riscv/net/bpf_jit_comp64.c > arch/s390/net/bpf_jit_comp.c > arch/sparc/net/bpf_jit_comp_64.c > arch/x86/net/bpf_jit_comp32.c > arch/x86/net/bpf_jit_comp.c > > The x86 JITs all pass the test, even though the comments are wrong. > The comments can easily be fixed of course. For JITs that have the > off-by-one behaviour, an easy fix would be to change all occurrences > of MAX_TAIL_CALL_CNT to MAX_TAIL_CALL_CNT - 1. We must first know > which JITs affected though. If you are going to fix ARM, please send a fix to comments for x86 as well. > > The fix is easy but setting up the test is hard. It took me quite some > time to get the qemu/arm setup up and running. If the same has to be > done for arm64, mips64, powerpc, powerpc64, riscv32, risc64, sparc and > s390, I will need some help with this. If someone already has a > working setup for any of the systems, the test can be performed on > that. > Unfortunately, I myself have only x86-64 setup. libbpf CI/kernel-patches CI we use to run all tests are running selftests against x86-64 only as well. There was temporarily halted effort to add s390x support as well, but it's not done yet. No one yet volunteered to set up any other platforms and I don't know if that's possible and how hard it would be to do within Github Actions platform we are currently using. So in short, I understand the challenges of testing all those platforms and I don't really expect any single person to do all that work. I've applied your fix, please follow up with ARM and comment fixes. > Or perhaps there is a better way to do this? If I implement a similar > test in selftest/bpf, that would trigger the CI when the patch is > submitted and we will see which JITs we need to fix. The other nice benefit of implementing this in selftest/bpf, besides continuous testing, is that you write it in C, which allows you to express much more complicated logic more easily. > > > On a related topic, please don't forget to include the target kernel > > tree for your patches: [PATCH bpf] or [PATCH bpf-next]. > > I'll add that! All patches I sent related to this are for the bpf-next tree. > > Johan