On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote: > Following kernel crash noticed on qemu-arm64 while running LTP syscalls > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and > 6.6.0-rc7-next-20231025. > > BAD: next-20231025 > Good: next-20231024 > > Reported-by: Linux Kernel Functional Testing <lkft@xxxxxxxxxx> > Reported-by: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> > > Log: > ---- > <1>[ 203.119139] Unable to handle kernel unknown 43 at virtual > address 0001ffff9e2e7d78 > <1>[ 203.119838] Mem abort info: > <1>[ 203.120064] ESR = 0x000000009793002b > <1>[ 203.121040] EC = 0x25: DABT (current EL), IL = 32 bits > set_robust_list01 1 TPASS : set_robust_list: retval = -1 > (expected -1), errno = 22 (expected 22) > set_robust_list01 2 TPASS : set_robust_list: retval = 0 > (expected 0), errno = 0 (expected 0) > <1>[ 203.124496] SET = 0, FnV = 0 > <1>[ 203.124778] EA = 0, S1PTW = 0 > <1>[ 203.125029] FSC = 0x2b: unknown 43 It looks like this is fallout from the LPA2 enablement. According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault: 0b101011 When FEAT_LPA2 is implemented: Translation fault, level -1. It's triggered here by an LDTR in a get_user() on a bogus userspace address. The exception is expected, and it's supposed to be handled via the exception fixups, but the LPA2 patches didn't update the fault_info table entries for all the level -1 faults, and so those all get handled by do_bad() and don't call fixup_exception(), causing them to be fatal. It should be relatively simple to update the fault_info table for the level -1 faults, but given the other issues we're seeing I think it's probably worth dropping the LPA2 patches for the moment. Mark. > <1>[ 203.126470] Data abort info: > <1>[ 203.126710] Access size = 4 byte(s) > <1>[ 203.126969] SSE = 0, SRT = 19 > <1>[ 203.127708] SF = 0, AR = 0 > <1>[ 203.128213] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > <1>[ 203.128788] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > <1>[ 203.130416] user pgtable: 4k pages, 52-bit VAs, pgdp=000000010606a780 > <1>[ 203.130817] [0001ffff9e2e7d78] pgd=0000000000000000 > <0>[ 203.132603] Internal error: Oops: 000000009793002b [#1] PREEMPT SMP > <4>[ 203.133483] Modules linked in: btrfs blake2b_generic libcrc32c > xor xor_neon raid6_pq zstd_compress crct10dif_ce sm3_ce sm3 sha3_ce > sha512_ce sha512_arm64 fuse drm backlight dm_mod ip_tables x_tables > <4>[ 203.135177] CPU: 1 PID: 653 Comm: set_robust_list Not tainted > 6.6.0-rc7-next-20231026 #1 > <4>[ 203.135642] Hardware name: linux,dummy-virt (DT) > <4>[ 203.136609] pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT > -SSBS BTYPE=--) > <4>[ 203.137028] pc : handle_futex_death (kernel/futex/core.c:661 > (discriminator 6)) > <4>[ 203.138844] lr : handle_futex_death > (arch/arm64/include/asm/uaccess.h:46 (discriminator 1) > kernel/futex/core.c:661 (discriminator 1)) > <4>[ 203.139132] sp : ffff8000805c3c10 > <4>[ 203.139356] x29: ffff8000805c3c10 x28: 0000ffffbf187740 x27: > d53bd04035000220 > <4>[ 203.140366] x26: 0000000000000000 x25: fff00000c6195280 x24: > fff00000c6195280 > <4>[ 203.141055] x23: 0000000000000001 x22: ffffa4e6aeef09d0 x21: > 0001ffff9e2e7d78 > <4>[ 203.141771] x20: 0001ffff9e2e7d78 x19: 0001ffff9e2e7d78 x18: > ffff8000805c3cf8 > <4>[ 203.142457] x17: 0000000000000000 x16: ffffa4e6aeae7078 x15: > 000000000000000a > <4>[ 203.143134] x14: 0000000000000000 x13: 1ffe000018258661 x12: > ffff8000805c3cf8 > <4>[ 203.143809] x11: 0000000000000000 x10: fff00000c12c3308 x9 : > ffffa4e6ad0e5748 > <4>[ 203.144504] x8 : ffff8000805c3c38 x7 : 0000000000000000 x6 : > 0000000000000001 > <4>[ 203.145186] x5 : 0000000000000000 x4 : fff00000c6195280 x3 : > 0000000000000000 > <4>[ 203.145929] x2 : 0000000000000000 x1 : 000ffffffffffffc x0 : > 0001ffff9e2e7d78 > <4>[ 203.147032] Call trace: > <4>[ 203.147254] handle_futex_death (kernel/futex/core.c:661 (discriminator 6)) > <4>[ 203.147560] exit_robust_list (kernel/futex/core.c:828) > <4>[ 203.148348] futex_exit_release (kernel/futex/core.c:1035 > (discriminator 1) kernel/futex/core.c:1131 (discriminator 1)) > <4>[ 203.148891] exit_mm_release (kernel/fork.c:1657) > <4>[ 203.149669] do_exit (kernel/exit.c:541 kernel/exit.c:858) > <4>[ 203.149897] do_group_exit (kernel/exit.c:1002) > <4>[ 203.150209] __arm64_sys_exit_group (kernel/exit.c:1032) > <4>[ 203.150980] invoke_syscall (arch/arm64/include/asm/current.h:19 > arch/arm64/kernel/syscall.c:56) > <4>[ 203.151234] el0_svc_common.constprop.0 > (include/linux/thread_info.h:127 (discriminator 2) > arch/arm64/kernel/syscall.c:144 (discriminator 2)) > <4>[ 203.151999] do_el0_svc (arch/arm64/kernel/syscall.c:156) > <4>[ 203.152231] el0_svc (arch/arm64/include/asm/daifflags.h:28 > arch/arm64/kernel/entry-common.c:133 > arch/arm64/kernel/entry-common.c:144 > arch/arm64/kernel/entry-common.c:679) > <4>[ 203.152936] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697) > <4>[ 203.153518] el0t_64_sync (arch/arm64/kernel/entry.S:595) > <0>[ 203.154424] Code: d50323bf d65f03c0 9248fa93 52800002 (b8400a73) > All code > ======== > 0: d50323bf autiasp > 4: d65f03c0 ret > 8: 9248fa93 and x19, x20, #0xff7fffffffffffff > c: 52800002 mov w2, #0x0 // #0 > 10:* b8400a73 ldtr w19, [x19] <-- trapping instruction > > Code starting with the faulting instruction > =========================================== > 0: b8400a73 ldtr w19, [x19] > <4>[ 203.155308] ---[ end trace 0000000000000000 ]--- > <1>[ 203.156234] Fixing recursive fault but reboot is needed! > <3>[ 203.157116] BUG: using smp_processor_id() in preemptible > [00000000] code: set_robust_list/653 > <4>[ 203.158116] caller is debug_smp_processor_id (lib/smp_processor_id.c:61) > <4>[ 203.158983] CPU: 1 PID: 653 Comm: set_robust_list Tainted: G > D 6.6.0-rc7-next-20231026 #1 > <4>[ 203.159451] Hardware name: linux,dummy-virt (DT) > <4>[ 203.159990] Call trace: > <4>[ 203.160394] dump_backtrace (arch/arm64/kernel/stacktrace.c:235) > <4>[ 203.160625] show_stack (arch/arm64/kernel/stacktrace.c:242) > <4>[ 203.160854] dump_stack_lvl (lib/dump_stack.c:107) > <4>[ 203.161869] dump_stack (lib/dump_stack.c:114) > <4>[ 203.162093] check_preemption_disabled > (arch/arm64/include/asm/current.h:19 > arch/arm64/include/asm/preempt.h:54 lib/smp_processor_id.c:53) > <4>[ 203.162898] debug_smp_processor_id (lib/smp_processor_id.c:61) > <4>[ 203.163176] __schedule (kernel/sched/core.c:6578 (discriminator 1)) > <4>[ 203.163894] do_task_dead (kernel/sched/core.c:6705) > <4>[ 203.164143] make_task_dead > (arch/arm64/include/asm/atomic_ll_sc.h:95 (discriminator 3) > arch/arm64/include/asm/atomic.h:49 (discriminator 3) > include/linux/atomic/atomic-arch-fallback.h:747 (discriminator 3) > include/linux/atomic/atomic-instrumented.h:253 (discriminator 3) > include/linux/refcount.h:193 (discriminator 3) > include/linux/refcount.h:250 (discriminator 3) > include/linux/refcount.h:267 (discriminator 3) kernel/exit.c:979 > (discriminator 3)) > <4>[ 203.164871] die (arch/arm64/kernel/traps.c:239) > <4>[ 203.165093] die_kernel_fault (arch/arm64/mm/fault.c:321) > <4>[ 203.165905] do_mem_abort (arch/arm64/mm/fault.c:850) > <4>[ 203.166149] el1_abort (arch/arm64/include/asm/daifflags.h:28 > arch/arm64/kernel/entry-common.c:399) > <4>[ 203.166864] el1h_64_sync_handler (arch/arm64/kernel/entry-common.c:486) > <4>[ 203.167173] el1h_64_sync (arch/arm64/kernel/entry.S:590) > <4>[ 203.167824] handle_futex_death (kernel/futex/core.c:661 (discriminator 6)) > <4>[ 203.168329] exit_robust_list (kernel/futex/core.c:828) > <4>[ 203.168829] futex_exit_release (kernel/futex/core.c:1035 > (discriminator 1) kernel/futex/core.c:1131 (discriminator 1)) > <4>[ 203.169375] exit_mm_release (kernel/fork.c:1657) > <4>[ 203.169884] do_exit (kernel/exit.c:541 kernel/exit.c:858) > <4>[ 203.170372] do_group_exit (kernel/exit.c:1002) > <4>[ 203.170857] __arm64_sys_exit_group (kernel/exit.c:1032) > <4>[ 203.171643] invoke_syscall (arch/arm64/include/asm/current.h:19 > arch/arm64/kernel/syscall.c:56) > <4>[ 203.172281] el0_svc_common.constprop.0 > (include/linux/thread_info.h:127 (discriminator 2) > arch/arm64/kernel/syscall.c:144 (discriminator 2)) > <4>[ 203.172815] do_el0_svc (arch/arm64/kernel/syscall.c:156) > <4>[ 203.173284] el0_svc (arch/arm64/include/asm/daifflags.h:28 > arch/arm64/kernel/entry-common.c:133 > arch/arm64/kernel/entry-common.c:144 > arch/arm64/kernel/entry-common.c:679) > <4>[ 203.173769] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697) > <4>[ 203.174052] el0t_64_sync (arch/arm64/kernel/entry.S:595) > > > > Links: > - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823098/suite/log-parser-test/test/check-kernel-bug/log > - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823098/suite/log-parser-test/tests/ > - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823050/suite/log-parser-test/tests/ > > -- > Linaro LKFT > https://lkft.linaro.org