Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Finn,

On 07/09/21 15:28, Finn Thain wrote:
Hi All,

While running stress-ng on my Quadra 630, I observed a reproducible panic
in do_exit():

        if (unlikely(in_interrupt()))
                panic("Aiee, killing interrupt handler!");

The console log says,

running --mmap -1 --mmap-odirect --mmap-bytes 100% -t 60 --timestamp --no-rand-seed --times
stress-ng: 22:22:17.70 info:  [19803] setting to a 60 second run per stressor
stress-ng: 22:22:17.70 info:  [19803] dispatching hogs: 1 mmap
[12634.030000] Kernel panic - not syncing: Aiee, killing interrupt handler!
[12634.030000] CPU: 0 PID: 19805 Comm: stress-ng Not tainted 5.14.0-mac #2
[12634.030000] Stack from 00b27de4:
[12634.030000]         00b27de4 0045b8ea 0045b8ea 00020000 00b27e00 003cb320 0045b8ea 00b27e20
[12634.030000]         003ca326 00020000 418004fc 00b26000 060324c0 00b26000 00ae9680 00b27e5c
[12634.030000]         0001dcc2 00453506 00000009 418004fc 00b26000 00000000 0741b000 00000009
[12634.030000]         00000008 00b27f38 00ae9680 00000006 00000000 00000001 00b27e6c 0001de78
[12634.030000]         00000009 06034510 00b27eb8 000271bc 00000009 0000000f 0000000e c043c000
[12634.030000]         00000000 0741b000 00000003 00b27f98 efe91934 efe90898 00025fc6 00b26000
[12634.030000] Call Trace: [<00020000>] resource_list_free+0x36/0x52
[12634.030000]  [<003cb320>] dump_stack+0x10/0x16
[12634.030000]  [<003ca326>] panic+0xba/0x2bc
[12634.030000]  [<00020000>] resource_list_free+0x36/0x52
[12634.030000]  [<0001dcc2>] do_exit+0x87e/0x9d6
[12634.030000]  [<0001de78>] do_group_exit+0x28/0xb6
[12634.030000]  [<000271bc>] get_signal+0x126/0x720
[12634.030000]  [<00025fc6>] send_signal+0xde/0x16e
[12634.030000]  [<00004bfa>] do_notify_resume+0x38/0x60a
[12634.030000]  [<00027076>] force_sig_fault_to_task+0x36/0x3a
[12634.030000]  [<00027092>] force_sig_fault+0x18/0x1c
[12634.030000]  [<00006e68>] send_fault_sig+0x44/0xc6
[12634.030000]  [<00006204>] buserr_c+0x260/0x5ca
[12634.030000]  [<00002cd8>] do_signal_return+0x10/0x1a
[12634.030000]  [<0018800e>] ext4_fill_super+0x1da0/0x3842
[12634.030000]  [<0010800a>] fsnotify_flush_notify+0x3c/0x72
[12634.030000]
[12634.030000] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---


I haven't found the cause yet. While investigating I tried enabling
various debug options including CONFIG_DEBUG_SPINLOCK. I then got
backtrace below from the same machine.

# /root/stress-ng --mmap -1 --mmap-odirect --mmap-bytes=100% -t 1800
stress-ng: info:  [49] setting to a 1800 second (30 mins, 0.00 secs) run per stressor
stress-ng: info:  [49] dispatching hogs: 1 mmap
[  248.710000] BUG: spinlock trylock failure on UP on CPU#0, stress-ng/51
[  248.710000]  lock: 0x53d4a0, .magic: dead4ead, .owner: stress-ng/51, .owner_cpu: 0
[  248.710000] CPU: 0 PID: 51 Comm: stress-ng Not tainted 5.14.0-multi-debug #2
[  248.710000] Stack from 00b159ac:
[  248.710000]         00b159ac 004da40f 004da40f 00000000 00b159c8 0043e88e 004da40f 00b159f4
[  248.710000]         0006316c 004d240e 0053d4a0 dead4ead 0095ebba 00000033 00000000 0053d4a0
[  248.710000]         00b15ae4 0055e6ac 00b15a0c 000632b2 0053d4a0 004d2488 ffffebf7 0053d458
[  248.710000]         00b15a18 0044780e 0053d4a0 00b15a3c 002b8694 0053d4a0 00000001 00000040
[  248.710000]         4bfffffe 0000001a 00528200 00528260 00b15a60 00069f6a 0000000e 00014200
[  248.710000]         00528200 00b15a5c 00b12600 00528200 00014200 00b15a7c 00069ff0 00528200
[  248.710000] Call Trace: [<0043e88e>] dump_stack+0x10/0x16
[  248.710000]  [<0006316c>] spin_dump+0x6c/0xc0
[  248.710000]  [<000632b2>] do_raw_spin_trylock+0x32/0x80
[  248.710000]  [<0044780e>] _raw_spin_trylock+0xe/0x40
[  248.710000]  [<002b8694>] add_interrupt_randomness+0x154/0x1c0
[  248.710000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
[  248.710000]  [<00014200>] INV_L2+0x8/0x10
[  248.710000]  [<00014200>] INV_L2+0x8/0x10
[  248.710000]  [<00069ff0>] handle_irq_event+0x30/0x80
[  248.710000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
[  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[  248.710000]  [<0000b944>] via1_irq+0x44/0xc0
[  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[  248.710000]  [<00002ea4>] do_IRQ+0x24/0x40
[  248.710000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<002b86b0>] add_interrupt_randomness+0x170/0x1c0
[  248.710000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
[  248.710000]  [<00069ff0>] handle_irq_event+0x30/0x80
[  248.710000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
[  248.710000]  [<000695c0>] generic_handle_irq+0x0/0x80
[  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[  248.710000]  [<0000bad4>] via2_irq+0x54/0x80
[  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[  248.710000]  [<00002ea4>] do_IRQ+0x24/0x40
[  248.710000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<00404040>] rtm_dump_nexthop+0x80/0x180
[  248.710000]  [<00033f20>] irq_exit+0x60/0x80
[  248.710000]  [<00002eaa>] do_IRQ+0x2a/0x40
[  248.710000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
[  248.710000]  [<00025a9b>] _FP_CALL_TOP+0xcc9d/0xd512
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<00025a9b>] _FP_CALL_TOP+0xcc9d/0xd512
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<0019e3e4>] ext4_get_block+0x24/0x40
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<0013a8f6>] __block_write_begin_int+0x1b6/0x700
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<0011b176>] update_time+0x36/0x40
[  248.710000]  [<0013b3ee>] block_page_mkwrite+0xae/0x100
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<0019e3c0>] ext4_get_block+0x0/0x40
[  248.710000]  [<00005526>] sys_cacheflush+0x2a6/0x500
[  248.710000]  [<001a5676>] ext4_page_mkwrite+0x2f6/0x6c0
[  248.710000]  [<0019e3c0>] ext4_get_block+0x0/0x40
[  248.710000]  [<000d51c0>] find_vma+0x0/0x80
[  248.710000]  [<00444500>] down_read+0x0/0x200
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<0019e3c0>] ext4_get_block+0x0/0x40
[  248.710000]  [<000ff292>] fput+0x12/0x40
[  248.710000]  [<000cbbc0>] do_page_mkwrite+0x40/0x140
[  248.710000]  [<004478ce>] _raw_spin_unlock+0xe/0x40
[  248.710000]  [<000cd516>] do_wp_page+0x296/0x4c0
[  248.710000]  [<000cfedc>] handle_mm_fault+0x79c/0x900
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<00007402>] do_page_fault+0x102/0x240
[  248.710000]  [<0000653c>] buserr_c+0x17c/0x600
[  248.710000]  [<00002ba8>] buserr+0x20/0x28
[  248.710000]  [<0008800d>] get_next_modinfo+0xcd/0x100
[  248.710000]  [<0009800d>] cgroup1_pidlist_destroy_all+0x8d/0xc0


Apparently add_interrupt_randomness() got re-entered, causing
spin_trylock() to contest input_pool.lock. But I don't see anything wrong
there -- false positive?

That can happen if interrupts are nested. In that sense, it's a false positive from CONFIG_DEBUG_SPINLOCK (failing to take a lock that the code knows how to deal with shouldn't cause a warning).

(This very event of reentering add_interrupt_randomness() from a nested interrupt caused an arcane bug in get_reg() to write beyond the end of memory on my Falcon. get_reg() uses READ_ONCE()/WRITE_ONCE() to guard against the effects of reentry now. That's why I said I have a hunch that we've been here before...

You could argue that disabling interrupts around that entropy mixing step would be an alternative, but Linus didn't like that for get_reg() either :-) )


I was able to reproduce that on both QEMU and Aranym:

[    0.000000] Linux version 5.14.0-multi-debug (fthain@nippy) (m68k-linux-gnu-gcc (btc) 6.4.0, GNU ld (btc) 2.28) #2 Sat Sep 4 15:58:43 AEST 2021
[    0.000000] Saving 202 bytes of bootinfo
[    0.000000] printk: console [debug0] enabled
[    0.000000] printk: debug: ignoring loglevel setting.
Blitter tried to read byte from register ff8a00 at 00816a
[    0.000000] Atari hardware found: VIDEL STDMA-SCSI ST_MFP YM2149 PCM CODEC DSP56K SCC ANALOG_JOY BLITTER IDE TT_CLK FDC_SPEED
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x00000081ffffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000000dfffff]
[    0.000000]   node   0: [mem 0x0000000001000000-0x00000000081fffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000000081fffff]
[    0.000000] NatFeats found (ARAnyM, 1.0)
[    0.000000] initrd: 07f3ea00 - 08200000
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 32475
[    0.000000] Kernel command line: debug=par console=tty0 ignore_loglevel initcall_blacklist=ide_falcon_driver_init,nfhd_init BOOT_IMAGE=vmlinux
[    0.000000] blacklisting initcall ide_falcon_driver_init
[    0.000000] blacklisting initcall nfhd_init
[    0.000000] Unknown command line parameters: initcall_blacklist=ide_falcon_driver_init BOOT_IMAGE=vmlinux
[    0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[    0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 120072K/131072K available (4385K kernel code, 357K rwdata, 772K rodata, 148K init, 121K bss, 11000K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 200
[    0.000000] clocksource: mfp: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 77769386670404 ns
[    0.010000] Console: colour dummy device 80x25
[    0.010000] printk: console [tty0] enabled
[    0.010000] Calibrating delay loop... 132.50 BogoMIPS (lpj=662528)
[    0.090000] pid_max: default: 32768 minimum: 301
[    0.090000] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.090000] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.110000] devtmpfs: initialized
[    0.120000] random: get_random_u32 called from bucket_table_alloc+0x13e/0x180 with crng_init=0
[    0.120000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.130000] futex hash table entries: 256 (order: 0, 7168 bytes, linear)
[    0.130000] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.130000] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
[    0.140000] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.190000] wait_for_initramfs() called before rootfs_initcalls
[    0.200000] SCSI subsystem initialized
[    0.200000] libata version 3.00 loaded.
[    0.210000] clocksource: Switched to clocksource mfp
[    0.280000] NET: Registered PF_INET protocol family
[    0.280000] IP idents hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.280000] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 6144 bytes, linear)
[    0.290000] TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.290000] TCP bind hash table entries: 1024 (order: 2, 20480 bytes, linear)
[    0.290000] TCP: Hash tables configured (established 1024 bind 1024)
[    0.290000] UDP hash table entries: 256 (order: 1, 12288 bytes, linear)
[    0.290000] UDP-Lite hash table entries: 256 (order: 1, 12288 bytes, linear)
[    0.300000] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.300000] RPC: Registered named UNIX socket transport module.
[    0.300000] RPC: Registered udp transport module.
[    0.300000] RPC: Registered tcp transport module.
[    0.310000] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.310000] initcall nfhd_init blacklisted
[    0.310000] Trying to unpack rootfs image as initramfs...
[    0.330000] nfeth: API 5
[    0.330000] workingset: timestamp_bits=30 max_order=15 bucket_order=0
[    0.700000] Freeing initrd memory: 2820K
[    0.710000] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    0.720000] io scheduler mq-deadline registered
[    0.720000] atafb atafb: phys_screen_base 6dc000 screen_len 311296
[    0.730000] atafb atafb: Determined 640x480, depth 4
[    0.730000] atafb atafb:    virtual 640x972
[    0.740000] Console: switching to colour frame buffer device 80x30
[    0.750000] fb0: frame buffer device, using 304K of video memory
[    0.750000] pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>)
[    0.750000] Non-volatile memory driver v1.3
[    0.780000] brd: module loaded
[    1.080000] scsi host0: Atari native SCSI, irq 15, io_port 0x0, base 0x0, can_queue 1, cmd_per_lun 2, sg_tablesize 1, this_id 7, flags { }
[    1.960000] random: fast init done
[    2.890000] atari-falcon-ide atari-falcon-ide: Atari Falcon and Q40/Q60 PATA controller
[    2.950000] scsi host1: pata_falcon
[    2.970000] ata1: PATA max PIO4 cmd 0xfff00000 ctl 0xfff00038 no IRQ, using PIO polling
[    3.020000] aoe: AoE v85 initialised.
[    3.050000] mousedev: PS/2 mouse device common for all mice
[    3.330000] input: Atari Keyboard as /devices/virtual/input/input0
[    3.350000] ata1.00: NODEV after polling detection
[    3.370000] ata1.01: ATA-2: slave, , max PIO2
[    3.390000] ata1.01: 322560 sectors, multi 0: LBA
[    3.400000] ata1.01: configured for PIO
[    3.430000] scsi 1:0:1:0: Direct-Access     ATA      slave            n/a  PQ: 0 ANSI: 5
[    3.490000] scsi 1:0:1:0: Attached scsi generic sg0 type 0
[    3.520000] sd 1:0:1:0: [sda] 322560 512-byte logical blocks: (165 MB/158 MiB)
[    3.570000] sd 1:0:1:0: [sda] Write Protect is off
[    3.590000] sd 1:0:1:0: [sda] Mode Sense: 00 3a 00 00
[    3.620000] sd 1:0:1:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    3.670000] rtc-generic rtc-generic: registered as rtc0
[    3.690000] NET: Registered PF_PACKET protocol family
[    3.730000] sd 1:0:1:0: [sda] Attached SCSI disk
[    3.760000] Freeing unused kernel image (initmem) memory: 148K
[    3.790000] This architecture does not have kernel memory protection.
[    3.810000] Run /init as init process
[    3.840000]   with arguments:
[    3.870000]     /init
[    3.880000]   with environment:
[    3.910000]     HOME=/
[    3.920000]     TERM=linux
[    3.940000]     initcall_blacklist=ide_falcon_driver_init
[    3.970000]     BOOT_IMAGE=vmlinux
[   85.240000] random: crng init done
[  139.730000] EXT4-fs (sda): mounting ext2 file system using the ext4 subsystem
[  139.770000] EXT4-fs (sda): mounted filesystem without journal. Opts: (null). Quota mode: disabled.
[ 4089.070000] BUG: spinlock trylock failure on UP on CPU#0, stress-ng/1341
[ 4089.070000]  lock: 0x53d4a0, .magic: dead4ead, .owner: stress-ng/1341, .owner_cpu: 0
[ 4089.070000] CPU: 0 PID: 1341 Comm: stress-ng Not tainted 5.14.0-multi-debug #2
[ 4089.070000] Stack from 00a7dcc8:
[ 4089.070000]         00a7dcc8 004da40f 004da40f 00000000 00a7dce4 0043e88e 004da40f 00a7dd10
[ 4089.070000]         0006316c 004d240e 0053d4a0 dead4ead 00956bba 0000053d 00000000 0053d4a0
[ 4089.070000]         00a7ddd8 0055e6ac 00a7dd28 000632b2 0053d4a0 004d2488 0005c81b 0053d458
[ 4089.070000]         00a7dd34 0044780e 0053d4a0 00a7dd58 002b8694 0053d4a0 00000001 0000007f
[ 4089.070000]         00860000 00000011 00528138 00528198 00a7dd7c 00069f6a 0000000d 00014200
[ 4089.070000]         00528138 00a7dd78 00a7def0 00528138 00014200 00a7dd98 00069ff0 00528138
[ 4089.070000] Call Trace: [<0043e88e>] dump_stack+0x10/0x16
[ 4089.070000]  [<0006316c>] spin_dump+0x6c/0xc0
[ 4089.070000]  [<000632b2>] do_raw_spin_trylock+0x32/0x80
[ 4089.070000]  [<0005c81b>] init_dl_bandwidth+0x1b/0x80
[ 4089.070000]  [<0044780e>] _raw_spin_trylock+0xe/0x40
[ 4089.070000]  [<002b8694>] add_interrupt_randomness+0x154/0x1c0
[ 4089.070000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
[ 4089.070000]  [<00014200>] INV_L2+0x8/0x10
[ 4089.070000]  [<00014200>] INV_L2+0x8/0x10
[ 4089.070000]  [<00069ff0>] handle_irq_event+0x30/0x80
[ 4089.070000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
[ 4089.070000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[ 4089.070000]  [<00002ea4>] do_IRQ+0x24/0x40
[ 4089.070000]  [<00002da8>] user_irqvec_fixup+0xc/0x14
[ 4089.070000]  [<0005c81a>] init_dl_bandwidth+0x1a/0x80
[ 4089.070000]  [<000d51c0>] find_vma+0x0/0x80
[ 4089.070000]  [<00444500>] down_read+0x0/0x200
[ 4089.070000]  [<002b86b0>] add_interrupt_randomness+0x170/0x1c0
[ 4089.070000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
[ 4089.070000]  [<00069ff0>] handle_irq_event+0x30/0x80
[ 4089.070000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
[ 4089.070000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[ 4089.070000]  [<00002ea4>] do_IRQ+0x24/0x40
[ 4089.070000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
[ 4089.070000]  [<0000653c>] buserr_c+0x17c/0x600
[ 4089.070000]  [<00002ba8>] buserr+0x20/0x28
[ 4089.070000]  [<0008800d>] get_next_modinfo+0xcd/0x100
[ 4089.070000]

Does anyone know what causes this?

Our practice to run interrupt handlers at the IPL of the current interrupt under service, instead of disabling local interrupts for the duration of the handler?

Not sure whether SMP systems keep a shared entropy pool, or multiple per-cpu ones, but in this particular instance, the spin lock prevented further damage. I'm worried about other cases where re-entering the common interrupt code manipulates shared state that goes undetected. But then, that code has certainly been audited many times...

Cheers,

	Michael




[Index of Archives]     [Video for Linux]     [Yosemite News]     [Linux S/390]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux