BUG: kernel panic after jbd bugs / kernel paging request

kardan <kardan@xxxxxxxxxx> · Tue, 20 Aug 2013 08:49:29 +0200

Dear developers,

At first thanks for all your work!

kernel version: 3.9.9-t23
kernel config: https://paste.debian.net/27351/
lspci -nvv output is attached.

I merged two kernel issues into one mail to find relations easier.

As both appeared only once I did not invest more time to try with
newer kernels. But I will do so for testing patches. Please give me
pointers where to dig in for reproducing.

1) jbd2_journal_dirty_metadata

I reported this in #linuxfs and was confirmed to forward it here.

12:55 < kardan:#linuxfs> it seems like my hdd is hanging (hdd led
turned. jbd is buzy for over an hour now:
1326 be/3 root          0.00 B     16.00 K  0.00 % 98.47 % [jbd2/sda1-8]

The load was caused by iceape (or something stacked below)
#1  0xb764680e in wait4 () at ../sysdeps/unix/syscall-template.S:81
#2  0xb76467e7 in __wait3 (stat_loc=..., options=0, usage=0x0)
at ../sysdeps/unix/bsd/bsd4.4/wait3.c:3312:55 

This led to several jbd related kernel bugs and a kernel panic in
the end. I attached the jbd-schedulings-bugs to avoid wrapping issues.

jbd2_journal_dirty_metadata+0x162/0x188
kmem_cache_alloc+0x26/0x9f
spin_unlock.isra.6+0x1e/0x1e
ext4_file_open+0x13e/0x1b2 
spin_lock.isra.7+0xa/0xb 
__d_instantiate+0x59/0x63 
fsnotify_perm+0x4d/0x58

__schedule_bug+0x39/0x49
__schedule+0x54/0x4e4 
ttwu_do_wakeup.constprop.111+0x39/0x56
try_to_wake_up+0xe7/0xef 
autoremove_wake_function+0xd/0x29 
activate_page+0xae/0xfc 
__cond_resched+0xf/0x19 
_cond_resched+0x10/0x18
Aug 15 18:06:10 delight
unmap_single_vma+0x3fc/0x49c 
unmap_vmas+0x30/0x4d 
exit_mmap+0x68/0xcb
get_signal_to_deliver+0x202/0x4d1 

kmem_cache_alloc+0x26/0x9f
spin_unlock.isra.6+0x1e/0x1e
ext4_file_open+0x13e/0x1b2 
fsnotify+0x1fa/0x22c
__d_instantiate+0x59/0x63

__schedule_bug+0x39/0x49
_schedule+0x54/0x4e4 
blk_peek_request+0x16f/0x1a4 
scsi_request_fn+0x35d/0x3fe 
activate_page+0xae/0xfc 
__cond_resched+0xf/0x19 
_cond_resched+0x10/0x18
unmap_single_vma+0x3fc/0x49c 
unmap_vmas+0x30/0x4d 
exit_mmap+0x68/0xcb
get_signal_to_deliver+0x202/0x4d1

__schedule_bug+0x39/0x49
__schedule+0x54/0x4e4 
__free_one_page+0xeb/0x1c4 
free_pcppages_bulk+0xbb/0x103
__cond_resched+0xf/0x19 
_cond_resched+0x10/0x18 
unmap_single_vma+0x3fc/0x49c 
unmap_vmas+0x30/0x4d
exit_mmap+0x68/0xcb
get_signal_to_deliver+0x202/0x4d1

 __schedule_bug+0x39/0x49
__schedule+0x54/0x4e4 
smp_apic_timer_interrupt+0x58/0x60 
apic_timer_interrupt+0x34/0x3c
activate_page+0xae/0xfc
__cond_resched+0xf/0x19 
_cond_resched+0x10/0x18 
unmap_single_vma+0x3fc/0x49c 
unmap_vmas+0x30/0x4d 
exit_mmap+0x68/0xcb

 __schedule_bug+0x39/0x49
__schedule+0x54/0x4e4 
vm_acct_memory+0x26/0x3c
__cache_free.isra.57+0xf/0x8f
 percpu_counter_add.constprop.21+0x26/0x3e 
spin_lock.isra.7+0xa/0xb
 dput+0x11/0x96 
spin_unlock.isra.11+0xa/0x1e 
__fput+0x15f/0x17e
mnt_add_count.isra.16+0x1c/0x34 
__cond_resched+0xf/0x19 
_cond_resched+0x10/0x18 
task_work_run+0x4f/0x5a 
do_exit+0x2c6/0x796 
kmsg_dump+0x1d/0xcc 
oops_end+0x86/0x8a 
do_bounds+0x4c/0x4c

 Full log: https://paste.debian.net/27347/

2) unable to handle kernel paging request

INFO: task kswapd0:21 blocked for more than 120 seconds.
[289200.502665]  [<c10b4258>] ? kmem_cache_alloc+0x2f/0x9f
[289200.502677]  [<c108b07f>] ? mempool_alloc+0x3b/0xee
[289200.502690]  [<c104c01f>] ? timekeeping_get_ns.constprop.
[289200.502703]  [<c13310e9>] ? io_schedule+0x34/0x47
[289200.502715]  [<c117e062>] ? get_request+0x416/0x4ae
[289200.502728]  [<c1005a8f>] ? native_sched_clock+0x48/0x94
[289200.502741]  [<c11811f1>] ? ioc_lookup_icq+0x41/0x49

[289800.503037] INFO: task kswapd0:21 blocked for more than 120 seconds.
[289800.503126]  [<c104300b>] ? sched_slice.isra.36+0x67/0x85
[289800.503139]  [<c104c01f>] ? timekeeping_get_ns.constprop.
[289800.503153] [<c13310e9>] ? io_schedule+0x34/0x47 
[289800.503165] [<c117e062>] ? get_request+0x416/0x4ae 
[289800.503178]  [<c11811f1>] ? ioc_lookup_icq+0x41/0x49 
[289800.503189]  [<c1038faa>] ? abort_exclusive_wait+0x64/0x64 
[289800.503199]  [<c117f938>] ? blk_queue_bio+0x185/0x26d

This issue dates back some weeks, sorry for not reporting earlear.

I had two occurances of this with several days in between. 
One week before the first occurence a new ram bank and a PCMCIA card
usb hub was added to the laptop.

Some days ago I saw a lot of IO errors once, they did not reappear.
On #linux-fs it was said the first one looks like use-after-free or
some other type of software-induced memory corruption. 

"Those tend to be nasty problems that can take months to track down
some of the crazy-looking problems end up as bad hardware.

have you also experienced crashes of userspace programs?"
kswap/kworker were followed by Xorg, iceweasel, claws and Xorg.

Awesome was inresponsive afterwards and I needed the restart lightdm.
In a new X session parts of old windows reappeared, this was
reproducable.

Log is attached.

-- 
Kardan <kardan@xxxxxxxxxx>
Encrypt your email: http://gnupg.org/documentation
Public GPG key 9D6108AE58C06558 at hkp://pool.sks-keyservers.net
fpr: F72F C4D9 6A52 16A1 E7C9  AE94 9D61 08AE 58C0 6558
Attachment:
kernel-paging-bug

Description: Binary data
Attachment:
lspci

Description: Binary data
Attachment:
signature.asc

Description: PGP signature