> On Oct 4, 2023, at 6:50 PM, Song Liu <songliubraving@xxxxxxxx> wrote:
>> On Oct 4, 2023, at 5:11 PM, Song Liu <songliubraving@xxxxxxxx> wrote:
[...]
>>>>>
>>>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>>>>> index a8c7e1c5abfa..fd8d4b0addfc 100644
>>>>>> --- a/kernel/bpf/hashtab.c
>>>>>> +++ b/kernel/bpf/hashtab.c
>>>>>> @@ -155,13 +155,15 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
>>>>>> hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
>>>>>>
>>>>>> preempt_disable();
>>>>>> + local_irq_save(flags);
>>>>>> if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
>>>>>> __this_cpu_dec(*(htab->map_locked[hash]));
>>>>>> + local_irq_restore(flags);
>>>>>> preempt_enable();
>>>>>> return -EBUSY;
>>>>>> }
>>>>>>
>>>>>> - raw_spin_lock_irqsave(&b->raw_lock, flags);
>>>>>> + raw_spin_lock(&b->raw_lock);
>>>>
>>>> Song,
>>>>
>>>> take a look at s390 crash in BPF CI.
>>>> I suspect this patch is causing it.
>>>
>>> It indeed looks like triggered by this patch. But I haven't figured
>>> out why it happens. v1 seems ok for the same tests.
Update my findings today:
I tried to reproduce the issue locally with qemu on my server (x86_64).
I got the following artifacts:
1. bzImage and selftests from CI: (need to login to GitHub)
https://github.com/kernel-patches/bpf/suites/16885416280/artifacts/964765766
2. cross compiler:
https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/13.2.0/x86_64-gcc-13.2.0-nolibc-s390-linux.tar.gz
3. root image:
https://libbpf-ci.s3.us-west-1.amazonaws.com/libbpf-vmtest-rootfs-2022.10.23-bullseye-s390x.tar.zst
With bzImage compiled in CI, I can reproduce the issue with qemu.
However, if I compile the kernel locally with the cross compiler
(with .config from CI, and then olddefconfig), the issue cannot
be reproduced. I have attached the two .config files here. They
look very similar, except the compiler version:
-CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0"
+CONFIG_CC_VERSION_TEXT="s390-linux-gcc (GCC) 13.2.0"
So far, I still think v2 is the right patch. But I really cannot
explain the issue on the bzImage from the CI. I cannot make much
sense out of the s390 assembly code either. (TIL: gdb on my x86
server can disassem s390 binary).
Ilya,
Could you please take a look at this?
Thanks,
Song
PS: the root image from the CI is not easy to use. Hopefully you
have something better than that.
Attachment:
ci.config
Description: ci.config
Attachment:
local.config
Description: local.config