Re: qemu-arm64: CONFIG_ARM64_64K_PAGES=y kernel crash on qemu-arm64 with Linux next-20241210 and above

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2024/12/19 06:37, Qu Wenruo 写道:


在 2024/12/19 02:22, Naresh Kamboju 写道:
On Wed, 18 Dec 2024 at 17:33, Naresh Kamboju
<naresh.kamboju@xxxxxxxxxx> wrote:

The following kernel crash noticed on qemu-arm64 while running the
Linux next-20241210 tag (to next-20241218) kernel built with
  - CONFIG_ARM64_64K_PAGES=y
  - CONFIG_ARM64_16K_PAGES=y
and running LTP smoke tests.

First seen on Linux next-20241210.
   Good: next-20241209
   Bad:  next-20241210 and next-20241218

qemu-arm64: 9.1.2

Anyone noticed this ?


Anders bisected this reported regression and found,
# first bad commit:
   [9c1d66793b6faa00106ae4c866359578bfc012d2]
   btrfs: validate system chunk array at btrfs_validate_super()

Weird, I run daily fstests with 64K page sized aarch64 VM.

But never hit a crash on this.

And the original crash call trace only points back to ext4, not btrfs.

Mind to test it with KASAN enabled?

Another thing is, how do you enable both 16K and 64K page size at the
same time?

The Kconfig should only select one page size IIRC.

And for the bisection, does it focus on the test failure or the crash?

For the test failure it looks like some older btrfs-progs, causing
invalid system chunk items, which got caught by the newer and more
strict sanity checks.

For the crash, unfortunately I'm not able to reproduce using fstests.
Will try LTP soon.

Thanks,
Qu

Thanks,
Qu


Test log:
---------
tst_test.c:1799: TINFO: === Testing on btrfs ===
tst_test.c:1158: TINFO: Formatting /dev/loop0 with btrfs opts=''
extra opts=''
<6>[   71.880167] BTRFS: device fsid
d492b571-012c-40a9-b8e1-efc97408d3bc devid 1 transid 6 /dev/loop0
(7:0) scanned by chdir01 (476)
tst_test.c:1170: TINFO: Mounting /dev/loop0 to
/tmp/LTP_chdJeywxF/mntpoint fstyp=btrfs flags=0
<6>[   71.960245] BTRFS info (device loop0): first mount of filesystem
d492b571-012c-40a9-b8e1-efc97408d3bc
<6>[   71.970667] BTRFS info (device loop0): using crc32c
(crc32c-arm64) checksum algorithm
<2>[   71.993486] BTRFS critical (device loop0): corrupt superblock
syschunk array: chunk_start=22020096, invalid chunk sectorsize, have
65536 expect 4096
<3>[   71.995802] BTRFS error (device loop0): superblock contains
fatal errors
<3>[   72.014538] BTRFS error (device loop0): open_ctree failed: -22
tst_test.c:1170: TBROK: mount(/dev/loop0, mntpoint, btrfs, 0, (nil))
failed: EINVAL (22)

Summary:
passed   48
failed   0
broken   1
skipped  0
warnings 0

Duration: 7.002s


===== symlink01 =====
command: symlink01
<12>[   72.494428] /usr/local/bin/kirk[253]: starting test symlink01
(symlink01)
symlink01    0  TINFO  :  Using /tmp/LTP_symmsYXet as tmpdir (tmpfs
filesystem)
symlink01    1  TPASS  :  Creation of symbolic link file to no object
file is ok
symlink01    2  TPASS  :  Creation of symbolic link file to no object
file is ok
symlink01    3  TPASS  :  Creation of symbolic link file and object
file via symbolic link is ok
symlink01    4  TPASS  :  Creating an existing symbolic link file
error is caught
symlink01    5  TPASS  :  Creating a symbolic link which exceeds
maximum pathname error is caught

Summary:
passed    5
failed    0
broken    0
skipped   0
warnings  0

Duration: 0.052s


===== stat04 =====
command: stat04
<12>[   72.966706] /usr/local/bin/kirk[253]: starting test stat04
(stat04)
tst_buffers.c:57: TINFO: Test is using guarded buffers
tst_tmpdir.c:316: TINFO: Using /tmp/LTP_staEABwgV as tmpdir (tmpfs
filesystem)
<6>[   73.447708] loop0: detected capacity change from 0 to 614400
tst_device.c:96: TINFO: Found free device 0 '/dev/loop0'
tst_test.c:1860: TINFO: LTP version: 20240930
tst_test.c:1864: TINFO: Tested kernel: 6.13.0-rc3-next-20241218 #1 SMP
PREEMPT @1734498806 aarch64
tst_test.c:1703: TINFO: Timeout per run is 0h 05m 24s
stat04.c:60: TINFO: Formatting /dev/loop0 with ext2 opts='-b 4096'
extra opts=''
mke2fs 1.47.1 (20-May-2024)
<3>[   73.859753] operation not supported error, dev loop0, sector
614272 op 0x9:(WRITE_ZEROES) flags 0x10000800 phys_seg 0 prio class 0
stat04.c:61: TINFO: Mounting /dev/loop0 to /tmp/LTP_staEABwgV/mntpoint
fstyp=ext2 flags=0
<6>[   73.939263] EXT4-fs (loop0): mounting ext2 file system using the
ext4 subsystem
<1>[   73.946378] Unable to handle kernel paging request at virtual
address a8fff00000c0c224
<1>[   73.947878] Mem abort info:
<1>[   73.949153]   ESR = 0x0000000096000005
<1>[   73.959105]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[   73.960031]   SET = 0, FnV = 0
<1>[   73.960349]   EA = 0, S1PTW = 0
<1>[   73.960638]   FSC = 0x05: level 1 translation fault
<1>[   73.961005] Data abort info:
<1>[   73.961293]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
<1>[   73.963739]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[   73.964980]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[   73.967132] [a8fff00000c0c224] address between user and kernel
address ranges
<0>[   73.968923] Internal error: Oops: 0000000096000005 [#1] PREEMPT
SMP
<4>[   73.970516] Modules linked in: btrfs blake2b_generic xor
xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce
sha512_arm64 fuse drm backlight ip_tables x_tables
<4>[   73.974237] CPU: 1 UID: 0 PID: 529 Comm: stat04 Not tainted
6.13.0-rc3-next-20241218 #1
<4>[   73.975359] Hardware name: linux,dummy-virt (DT)
<4>[   73.977170] pstate: 62402009 (nZCv daif +PAN -UAO +TCO -DIT
-SSBS BTYPE=--)
<4>[ 73.978295] pc : __kmalloc_node_noprof (mm/slub.c:492
mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152
mm/slub.c:4293 mm/slub.c:4300)
<4>[ 73.980200] lr : alloc_cpumask_var_node (lib/cpumask.c:62
(discriminator 2))
<4>[   73.981466] sp : ffff80008258f950
<4>[   73.982228] x29: ffff80008258f970 x28: ffffa93389398000 x27:
0000000000000001
<4>[   73.983875] x26: fffffc1fc0303080 x25: 00000000ffffffff x24:
a8fff00000c0c224
<4>[   73.985649] x23: 0000000000000cc0 x22: ffffa93387f51d0c x21:
00000000ffffffff
<4>[   73.986188] x20: fff00000c0010400 x19: 0000000000000008 x18:
0000000000000000
<4>[   73.988686] x17: fff056cd748b0000 x16: ffff800080020000 x15:
0000000000000000
<4>[   73.990276] x14: 0000000000002a66 x13: 0000000000004000 x12:
0000000000000001
<4>[   73.992401] x11: 0000000000000002 x10: 0000000000004001 x9 :
ffffa93387f51d0c
<4>[   73.993108] x8 : fff00000c2c99240 x7 : 0000000000000001 x6 :
0000000000000001
<4>[   73.993886] x5 : fff00000c4879800 x4 : 0000000000000000 x3 :
000000000033a401
<4>[   73.995550] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 :
fff00000c0010400
<4>[   73.997017] Call trace:
<4>[ 73.998266] __kmalloc_node_noprof+0x100/0x4a0 P
<4>[ 73.999716] alloc_cpumask_var_node (lib/cpumask.c:62
(discriminator 2))
<4>[ 74.000942] alloc_workqueue_attrs (kernel/workqueue.c:4624
(discriminator 1))
<4>[ 74.001327] apply_wqattrs_prepare (kernel/workqueue.c:5263)
<4>[ 74.003095] apply_workqueue_attrs_locked (kernel/workqueue.c:5351)
<4>[ 74.003855] alloc_workqueue (kernel/workqueue.c:5722
(discriminator 1) kernel/workqueue.c:5772 (discriminator 1))
<4>[ 74.005398] ext4_fill_super (fs/ext4/super.c:5484 fs/ext4/
super.c:5722)
<4>[ 74.006132] get_tree_bdev_flags (fs/super.c:1636)
<4>[ 74.007624] get_tree_bdev (fs/super.c:1660)
<4>[ 74.008664] ext4_get_tree (fs/ext4/super.c:5755)
<4>[ 74.009423] vfs_get_tree (fs/super.c:1814)
<4>[ 74.009703] path_mount (fs/namespace.c:3556 fs/namespace.c:3883)
<4>[ 74.010608] __arm64_sys_mount (fs/namespace.c:3896
fs/namespace.c:4107 fs/namespace.c:4084 fs/namespace.c:4084)
<4>[ 74.011527] invoke_syscall.constprop.0
(arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54)
<4>[ 74.012798] do_el0_svc (include/linux/thread_info.h:135
(discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2)
arch/arm64/kernel/syscall.c:151 (discriminator 2))
<4>[ 74.014042] el0_svc (arch/arm64/include/asm/irqflags.h:82
(discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
1) arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
arch/arm64/kernel/entry-common.c:165 (discriminator 1)
arch/arm64/kernel/entry-common.c:178 (discriminator 1)
arch/arm64/kernel/entry-common.c:745 (discriminator 1))
<4>[ 74.014942] el0t_64_sync_handler (arch/arm64/kernel/entry-
common.c:763)
<4>[ 74.015917] el0t_64_sync (arch/arm64/kernel/entry.S:600)
<0>[ 74.017042] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a)
All code
========
    0: 12800019 mov w25, #0xffffffff            // #-1
    4: b9402a82 ldr w2, [x20, #40]
    8: aa1803e1 mov x1, x24
    c: aa1403e0 mov x0, x20
   10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction

Code starting with the faulting instruction
===========================================
    0: f8626b1a ldr x26, [x24, x2]
<4>[   74.019014] ---[ end trace 0000000000000000 ]---
tst_test.c:1763: TBROK: Test killed by SIGSEGV!

Summary:
passed   0
failed   0
broken   1
skipped  0
warnings 0
tst_device.c:269: TWARN: ioctl(/dev/loop0, LOOP_CLR_FD, 0) no ENXIO
for too long
Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m
Tainted kernel: ['kernel died recently, i.e. there was an OOPS or
BUG'][0m
Restarting SUT: host

===== df01_sh =====
command: df01.sh
<12>[   76.370093] /usr/local/bin/kirk[253]: starting test df01_sh
(df01.sh)
Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m
<1>[   76.603065] Unable to handle kernel paging request at virtual
address a8fff00000c0c224
<1>[   76.603922] Mem abort info:
<1>[   76.604197]   ESR = 0x0000000096000005
<1>[   76.604638]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[   76.605128]   SET = 0, FnV = 0
<1>[   76.606996]   EA = 0, S1PTW = 0
<1>[   76.607274]   FSC = 0x05: level 1 translation fault
<1>[   76.607611] Data abort info:
<1>[   76.607897]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
<1>[   76.609765]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[   76.610958]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[   76.611652] [a8fff00000c0c224] address between user and kernel
address ranges
<0>[   76.612130] Internal error: Oops: 0000000096000005 [#2] PREEMPT
SMP
<4>[   76.613305] Modules linked in: btrfs blake2b_generic xor
xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce
sha512_arm64 fuse drm backlight ip_tables x_tables
<4>[   76.617688] CPU: 1 UID: 0 PID: 553 Comm: df01.sh Tainted: G
D            6.13.0-rc3-next-20241218 #1
<4>[   76.620869] Tainted: [D]=DIE
<4>[   76.621184] Hardware name: linux,dummy-virt (DT)
<4>[   76.622671] pstate: 63402009 (nZCv daif +PAN -UAO +TCO +DIT
-SSBS BTYPE=--)
<4>[ 76.623693] pc : __kmalloc_node_noprof (mm/slub.c:492
mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152
mm/slub.c:4293 mm/slub.c:4300)
<4>[ 76.624180] lr : __vmalloc_node_range_noprof
(include/linux/slab.h:922 mm/vmalloc.c:3647 mm/vmalloc.c:3846)
<4>[   76.625290] sp : ffff80008258fa90
<4>[   76.626275] x29: ffff80008258fab0 x28: fff00000c2c98e80 x27:
fff00000c48fd100
<4>[   76.626966] x26: fffffc1fc0303080 x25: 00000000ffffffff x24:
a8fff00000c0c224
<4>[   76.627599] x23: 0000000000000dc0 x22: ffffa93386d87390 x21:
00000000ffffffff
<4>[   76.628603] x20: fff00000c0010400 x19: 0000000000000008 x18:
0000000000000000
<4>[   76.629618] x17: 0000000000000000 x16: ffff800082180000 x15:
ffff800080000000
<4>[   76.630999] x14: fff00000c00203f0 x13: 00000ffff8000821 x12:
0000000000000000
<4>[   76.632089] x11: 0000000000000000 x10: 0000000000000000 x9 :
ffffa93386d87390
<4>[   76.634293] x8 : ffff80008258f908 x7 : fff00000c2c98e80 x6 :
0000000000010000
<4>[   76.634816] x5 : ffffa93389379000 x4 : 0000000000000000 x3 :
000000000033b801
<4>[   76.636355] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 :
fff00000c0010400
<4>[   76.638309] Call trace:
<4>[ 76.639031] __kmalloc_node_noprof+0x100/0x4a0 P
<4>[ 76.640890] __vmalloc_node_range_noprof (include/linux/slab.h:922
mm/vmalloc.c:3647 mm/vmalloc.c:3846)
<4>[ 76.641267] copy_process (kernel/fork.c:314 (discriminator 1)
kernel/fork.c:1061 (discriminator 1) kernel/fork.c:2176 (discriminator
1))
<4>[ 76.641795] kernel_clone (kernel/fork.c:2758)
<4>[ 76.643003] __do_sys_clone (kernel/fork.c:2902)
<4>[ 76.644078] __arm64_sys_clone (kernel/fork.c:2869)
<4>[ 76.645306] invoke_syscall.constprop.0
(arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54)
<4>[ 76.646337] do_el0_svc (include/linux/thread_info.h:135
(discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2)
arch/arm64/kernel/syscall.c:151 (discriminator 2))
<4>[ 76.646974] el0_svc (arch/arm64/include/asm/irqflags.h:82
(discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
1) arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
arch/arm64/kernel/entry-common.c:165 (discriminator 1)
arch/arm64/kernel/entry-common.c:178 (discriminator 1)
arch/arm64/kernel/entry-common.c:745 (discriminator 1))
<4>[ 76.647709] el0t_64_sync_handler (arch/arm64/kernel/entry-
common.c:763)
<4>[ 76.649032] el0t_64_sync (arch/arm64/kernel/entry.S:600)
<0>[ 76.649724] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a)

<trim>

All code
========
    0: 12800019 mov w25, #0xffffffff            // #-1
    4: b9402a82 ldr w2, [x20, #40]
    8: aa1803e1 mov x1, x24
    c: aa1403e0 mov x0, x20
   10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction

Code starting with the faulting instruction
===========================================
    0: f8626b1a ldr x26, [x24, x2]
  <4>[   79.647693] ---[ end trace 0000000000000000 ]---
  <0>[   79.649260] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
  <2>[   79.650229] SMP: stopping secondary CPUs
  <0>[   79.651558] Kernel Offset: 0x293306a00000 from
0xffff800080000000
  <0>[   79.652015] PHYS_OFFSET: 0x40000000
  <0>[   79.652461] CPU features: 0x000,000000d0,60bef2d8,cb7e7f3f
  <0>[   79.653039] Memory Limit: none
  <0>[   79.653854] ---[ end Kernel panic - not syncing: Attempted to
kill init! exitcode=0x0000000b ]---


Links:
-------
  - https://qa-reports.linaro.org/lkft/linux-next-master/build/
next-20241218/testrun/26396709/suite/log-parser-test/test/panic-
multiline-kernel-panic-not-syncing-attempted-to-kill-init-exitcode/
history/
  - https://qa-reports.linaro.org/lkft/linux-next-master/build/
next-20241212/testrun/26277241/suite/log-parser-test/test/panic-
multiline-kernel-panic-not-syncing-attempted-to-kill-init-exitcode/log
  - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/
tests/2qNMDhPFtR8j185QSvZMn989u84
  - https://storage.tuxsuite.com/public/linaro/lkft/
builds/2qNMCQazNJteQLGCw7MnMtUwzkD/
  - https://qa-reports.linaro.org/lkft/linux-next-master/build/
next-20241211/testrun/26266202/suite/log-parser-test/test/panic-
multiline-kernel-panic-not-syncing-attempted-to-kill-init-exitcode/
details/


metadata:
----
   git describe: next-20241210..next-20241218
   git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/
linux-next.git
   kernel config:
https://storage.tuxsuite.com/public/linaro/lkft/
builds/2qNMCQazNJteQLGCw7MnMtUwzkD/config
   build url: https://storage.tuxsuite.com/public/linaro/lkft/
builds/2qNMCQazNJteQLGCw7MnMtUwzkD/
   toolchain: gcc-13
   config: CONFIG_ARM64_64K_PAGES=y, CONFIG_ARM64_16K_PAGES=y
   arch: arm64
   qemu: qemu-arm64 version 9.1.2


--
Linaro LKFT
https://lkft.linaro.org









[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux