Re: [PATCH 6.1 000/217] 6.1.95-rc1 review [parisc64/C3700 boot failures]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/22/24 08:13, Helge Deller wrote:
On 6/22/24 16:58, Guenter Roeck wrote:
[ Copying parisc maintainers - maybe they can test on real hardware ]

On 6/19/24 05:54, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.95 release.
There are 217 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000.
Anything received after that time might be too late.

...
Oleg Nesterov <oleg@xxxxxxxxxx>
     zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING


I can not explain it, but this patch causes all my parisc64 (C3700)
boot tests to crash. There are lots of memory corruption BUGs such as

[    0.000000] =============================================================================
[    0.000000] BUG kmalloc-96 (Not tainted): Padding overwritten. 0x0000000043411dd0-0x0000000043411f5f @offset=3536

ultimately followed by

[    0.462562] Unaligned handler failed, ret = -14
...
[    0.469160]  IAOQ[0]: idr_alloc_cyclic+0x48/0x118
[    0.469372]  IAOQ[1]: idr_alloc_cyclic+0x54/0x118
[    0.469548]  RP(r2): __kernfs_new_node.constprop.0+0x160/0x420
[    0.469782] Backtrace:
[    0.469928]  [<00000000404af108>] __kernfs_new_node.constprop.0+0x160/0x420
[    0.470285]  [<00000000404b0cac>] kernfs_new_node+0xbc/0x118
[    0.470523]  [<00000000404b158c>] kernfs_create_empty_dir+0x54/0xf0
[    0.470756]  [<00000000404b665c>] sysfs_create_mount_point+0x4c/0xb0
[    0.470996]  [<00000000401181cc>] cgroup_init+0x5b4/0x738
[    0.471213]  [<0000000040102220>] start_kernel+0x1238/0x1308
[    0.471429]  [<0000000040107c90>] start_parisc+0x188/0x1d0
...
[    0.474956] Kernel panic - not syncing: Attempted to kill the idle task!
SeaBIOS wants SYSTEM RESET.

This is with qemu v9.0.1.

Just to be sure, did you tested the same kernel on physical hardware as well?

Please note, that 64-bit hppa (C3700) support in qemu was just recently added
and is still considered experimental.
So, maybe it's not a bug in the source, but in qemu...?!?


Following up on this for everyone: Helge doesn't see the problem on real hardware.
I can make the problem disappear by any of the following:
- Use gcc 13.3 instead of 12.3
- Disable CONFIG_KUNIT
- Enable CONFIG_PAGE_POISONING (without actually enabling it in the runtime)

Overall, that suggests some kind of heisenbug, most likely in qemu,
unrelated to the commit above.

Thanks, and sorry for the noise.

Guenter





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux