Re: Problem testing with S390x under QEMU on x86_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2024-08-20 at 17:38 -0700, Tony Ambardar wrote:


[...]

> I used the command line:
>     ./test_progs -d
> get_stack_raw_tp,stacktrace_build_id,verifier_iterating_callbacks,tai
> lcalls
> 
> which includes the current DENYLIST.s390x as well as 'tailcalls',
> which
> is also excluded by the kernel-patches/bpf s390x CI. I note the CI
> excludes several more tests that seem to work. Any idea why that is?
> 
> For reference, the issue with 'tailcalls/tailcall_hierarchy_count' is
> an
> RCU stall and kernel hang:
> 
> root@(none):/usr/libexec/kselftests-bpf# ./test_progs -v --debug -n
> 332/19
> bpf_testmod.ko is already unloaded.
> Loading bpf_testmod.ko...
> Successfully loaded bpf_testmod.ko.
> test_tailcall_hierarchy_count:PASS:load obj 0 nsec
> test_tailcall_hierarchy_count:PASS:find entry prog 0 nsec
> test_tailcall_hierarchy_count:PASS:prog_fd 0 nsec
> test_tailcall_hierarchy_count:PASS:find jmp_table 0 nsec
> test_tailcall_hierarchy_count:PASS:map_fd 0 nsec
> test_tailcall_hierarchy_count:PASS:update jmp_table 0 nsec
> test_tailcall_hierarchy_count:PASS:find data_map 0 nsec
> test_tailcall_hierarchy_count:PASS:open fentry_obj file 0 nsec
> test_tailcall_hierarchy_count:PASS:find fentry prog 0 nsec
> test_tailcall_hierarchy_count:PASS:set_attach_target subprog_tail 0
> nsec
> test_tailcall_hierarchy_count:PASS:load fentry_obj 0 nsec
> test_tailcall_hierarchy_count:PASS:attach_trace 0 nsec
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu:    0-....: (1 GPs behind) idle=4eb4/1/0x4000000000000000
> softirq=527/528 fqs=1050
> rcu:    (t=2100 jiffies g=-379 q=20 ncpus=2)
> CPU: 0 UID: 0 PID: 84 Comm: test_progs Tainted: G           O      
> 6.10.0-12706-g853081e84612-dirty #111
> Tainted: [O]=OOT_MODULE
> Hardware name: QEMU 8561 QEMU (KVM/Linux)
> Krnl PSW : 0704f00180000000 000003ffe00f8fca
> (lock_release+0xf2/0x190)
>            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0
> EA:3
> Krnl GPRS: 00000000b298dd12 0000000000000000 000002f23fd767c8
> 000003ffe1848800
>            0000000000000001 0000037fe034edbc 0000037fe034fd74
> 0000000000000001
>            0700037fe034edc8 000003ffe0249e48 000003ffe1848800
> 000003ffe19ba7c8
>            000003ff9f7a7f90 0000037fe034ef00 000003ffe00f8f96
> 0000037fe034ed78
> Krnl Code: 000003ffe00f8fbe: a7820300           tmhh    %r8,768
>            000003ffe00f8fc2: a7840004           brc    
> 8,000003ffe00f8fca
>           #000003ffe00f8fc6: ad03f0a0           stosm   160(%r15),3
>           >000003ffe00f8fca: eb8ff0a80004       lmg    
> %r8,%r15,168(%r15)
>            000003ffe00f8fd0: 07fe               bcr     15,%r14
>            000003ffe00f8fd2: c0e500011057       brasl  
> %r14,000003ffe011b080
>            000003ffe00f8fd8: ec26ffa6007e       cij    
> %r2,0,6,000003ffe00f8f24
>            000003ffe00f8fde: c01000b78b96       larl   
> %r1,000003ffe17ea70a
> Call Trace:
>  [<000003ffe00f8fca>] lock_release+0xf2/0x190
> ([<000003ffe00f8f96>] lock_release+0xbe/0x190)
>  [<000003ffe0249ea4>] __bpf_prog_exit_recur+0x5c/0x68
>  [<000003ff6001e0b0>] bpf_trampoline_73014444060+0xb0/0xd2
>  [<000003ff60024d14>] bpf_prog_eb7edc599e93dcc8_entry+0x5c/0xc8
>  [<000003ff60024d14>] bpf_prog_eb7edc599e93dcc8_entry+0x5c/0xc8
>  [<000003ff60024d14>] bpf_prog_eb7edc599e93dcc8_entry+0x5c/0xc8
>  [<000003ff60024d2a>] bpf_prog_eb7edc599e93dcc8_entry+0x72/0xc8
>  [<000003ff60024d2a>] bpf_prog_eb7edc599e93dcc8_entry+0x72/0xc8
>  [<000003ff60024d14>] bpf_prog_eb7edc599e93dcc8_entry+0x5c/0xc8
>  [<000003ff60024d14>] bpf_prog_eb7edc599e93dcc8_entry+0x5c/0xc8
>  [<000003ff60024d14>] bpf_prog_eb7edc599e93dcc8_entry+0x5c/0xc8
>  [<000003ffe084ecee>] bpf_test_run+0x216/0x3a8
>  [<000003ffe084f9cc>] bpf_prog_test_run_skb+0x21c/0x630
>  [<000003ffe0202ad2>] __sys_bpf+0x7ea/0xbb0
>  [<000003ffe0203114>] __s390x_sys_bpf+0x44/0

Thanks for the detailed analysis! I will need to port

commit 116e04ba1459fc08f80cf27b8c9f9f188be0fcb2
Author: Leon Hwang <hffilwlqm@xxxxxxxxx>
Date:   Sun Jul 14 20:39:00 2024 +0800

    bpf, x64: Fix tailcall hierarchy

to s390x to fix this.

> Another curiosity is with 'uprobe_multi_test/attach_uprobe_fails',
> which usually succeeds but generates an inode warning in
> kernel/events/uprobes.c: (with cross-compiled and native test_progs)
> 
> #416     uprobe_autoattach:OK
> ref_ctr_offset mismatch. inode: 0x73c7 offset: 0x3c9b78
> ref_ctr_offset(old): 0x464d7be ref_ctr_offset(new): 0x464d7bc
> #417/1   uprobe_multi_test/skel_api:OK
> #417/2   uprobe_multi_test/attach_api_pattern:OK
> #417/3   uprobe_multi_test/attach_api_syms:OK
> #417/4   uprobe_multi_test/link_api:OK
> #417/5   uprobe_multi_test/bench_uprobe:OK
> #417/6   uprobe_multi_test/bench_usdt:OK
> #417/7   uprobe_multi_test/attach_api_fails:OK
> #417/8   uprobe_multi_test/attach_uprobe_fails:OK
> #417/9   uprobe_multi_test/consumers:OK
> #417     uprobe_multi_test:OK
> 
> but occasionally I see this kernel fault:
> 
> #416     uprobe_autoattach:OK
> User process fault: interruption code 0001 ilc:1 in
> test_progs[3c9ba2,2aa3b580000+cc5000]
> CPU: 0 UID: 0 PID: 165 Comm: new_name Tainted: G           OE     
> 6.10.0-12707-g8189b8007d01 #114
> Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> Hardware name: QEMU 8561 QEMU (KVM/Linux)
> User PSW : 0705000180000000 000002aa3b949ba2
>            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:0 PM:0 RI:0
> EA:3
> User GPRS: cccccccccccccccd 0000000000000000 000003ffbe080000
> 0000000000000000
>            000003ffbeb74828 0000000000000006 0000000000000000
> 000002aa3c245928
>            000003ffbeb2cbc0 000003ffbeb2d020 0000000000000003
> 000003ffdb379f20
>            000003ffbeb2cf98 0000000000000000 000002aa3b94a400
> 000003ffdb379f20
> User Code:>000002aa3b949ba2: 0000               illegal
>            000002aa3b949ba4: 0700               bcr     0,%r0
>            000002aa3b949ba6: b3cd00b0           lgdr    %r11,%f0
>            000002aa3b949baa: 07fe               bcr     15,%r14
>            000002aa3b949bac: 0707               bcr     0,%r7
>            000002aa3b949bae: 0707               bcr     0,%r7
>            000002aa3b949bb0: ebbff0580024       stmg   
> %r11,%r15,88(%r15)
>            000002aa3b949bb6: e3f0ff48ff71       lay     %r15,-
> 184(%r15)
> Last Breaking-Event-Address:
>  [<000002aa3b94a3fa>] test_progs[3ca3fa,2aa3b580000+cc5000]
> 
> 
> Have you seen this fault before? Is the inode warning expected by the
> test?

Yes, this is caused by:

/* attach fail due to wrong ref_ctr_offs on one of the uprobes */
attach_uprobe_fail_refctr(skel);

The fault is a user fault, not a kernel fault. I could not reproduce it
on a real s390x machine. This may be an emulation problem, since
apparently the kernel does not recognize that "0000 illegal" is an
uprobe. Quite some time ago I fixed a similar issue in this area,
perhaps it's a new flavour. I will investigate.

[...]

Best regards,
Ilya





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux