Hi, On Thu, Oct 13, 2011 at 12:09 PM, Jan Kara <jack@xxxxxxx> wrote: > Hello, > > On Mon 03-10-11 01:44:11, Justin Gottula wrote: >> Please tell me if this is the right place to report this; it appears to >> be an ext4-specific issue. > Yes, it is the right place. Thanks for your report. > >> I had a kernel panic occur today during some particularly heavy hard >> drive activity on an ext4 filesystem, running a bleeding edge kernel >> (3.1-rc8). Unfortunately it's not an easily reproducible bug, it was >> pretty much a random occurrence. > Have you seen the bug since the report? Perhaps fortunately, the bug hasn't cropped up since I first reported it. > >> Here's the excerpt from dmesg (these were the only pertinent lines): >> >> [ 5922.546229] BUG: unable to handle kernel NULL pointer dereference at >> (null) >> [ 5922.546272] IP: [<ffffffffa00ccac8>] jbd2_journal_file_inode+0x38/0xf0 [jbd2] >> [ 5922.546308] PGD 195b3f067 PUD 1c1a70067 PMD 0 >> [ 5922.546339] Oops: 0000 [#1] PREEMPT SMP >> [ 5922.546361] CPU 1 >> [ 5922.546370] Modules linked in: it87 hwmon_vid coretemp ipv6 vboxnetflt >> vboxnetadp vboxdrv fuse loop uvcvideo joydev videodev snd_usb_audio usbhid >> usb_storage media hid xpad v4l2_compat_ioctl32 ff_memless snd_usbmidi_lib uas >> evdev ppdev radeon sg snd_hda_codec_hdmi parport_pc snd_hda_codec_realtek >> i2c_i801 psmouse parport serio_raw iTCO_wdt ttm floppy iTCO_vendor_support >> drm_kms_helper drm snd_oxygen snd_hda_intel snd_oxygen_lib i2c_algo_bit >> processor intel_agp snd_hda_codec intel_gtt i2c_core snd_hwdep button >> snd_mpu401_uart snd_rawmidi snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq >> snd_seq_device snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd >> soundcore pcspkr r8169 mii xhci_hcd btrfs lzo zlib_deflate libcrc32c crc32c ext4 >> mbcache jbd2 crc16 uhci_hcd ehci_hcd usbcore sr_mod cdrom ahci libahci sd_mod >> pata_jmicron libata scsi_mod >> [ 5922.546911] >> [ 5922.546919] Pid: 951, comm: flush-8:16 Not tainted 3.1.0-rc8-mainline #1 >> Gigabyte Technology Co., Ltd. EP35-DS3R/EP35-DS3R >> [ 5922.546963] RIP: 0010:[<ffffffffa00ccac8>] [<ffffffffa00ccac8>] >> jbd2_journal_file_inode+0x38/0xf0 [jbd2] >> [ 5922.546999] RSP: 0000:ffff88020fda5960 EFLAGS: 00010246 >> [ 5922.547019] RAX: 00000000fffffffb RBX: ffff8801293f5500 RCX: 0000000000000001 >> [ 5922.547043] RDX: ffff88020e5c7c00 RSI: 0000000000000000 RDI: ffff880202ada750 >> [ 5922.547067] RBP: ffff88020fda5980 R08: 00000000182c0e00 R09: 0000000000000001 >> [ 5922.547091] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >> [ 5922.547115] R13: ffff88020fdb0800 R14: ffff880202ada750 R15: ffff88004f00e890 >> [ 5922.547139] FS: 0000000000000000(0000) GS:ffff88021fc80000(0000) >> knlGS:0000000000000000 >> [ 5922.547166] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> [ 5922.547185] CR2: 0000000000000000 CR3: 000000014411a000 CR4: 00000000000406e0 >> [ 5922.547209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 5922.547233] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> [ 5922.547256] Process flush-8:16 (pid: 951, threadinfo ffff88020fda4000, task >> ffff88021012b250) >> [ 5922.547284] Stack: >> [ 5922.547296] ffff88020fda5a80 000000000000bada 0000000000000001 >> ffff880202ada750 >> [ 5922.547334] ffff88020fda5a00 ffffffffa00ef35c ffffea00046022c0 >> ffffffffa00efd3b >> [ 5922.547369] ffffea00081e0840 0000000000000000 ffffffffa0125630 >> 00000000072e62da >> [ 5922.547403] Call Trace: >> [ 5922.547419] [<ffffffffa00ef35c>] mpage_da_map_and_submit+0x2bc/0x400 [ext4] >> [ 5922.547449] [<ffffffffa00efd3b>] ? ext4_da_writepages+0x35b/0x6b0 [ext4] >> [ 5922.547476] [<ffffffffa00efd86>] ext4_da_writepages+0x3a6/0x6b0 [ext4] >> [ 5922.547503] [<ffffffff81108534>] do_writepages+0x24/0x40 >> [ 5922.547523] [<ffffffff811848a3>] writeback_single_inode+0x183/0x490 >> [ 5922.547546] [<ffffffff81184faa>] writeback_sb_inodes+0x1aa/0x270 >> [ 5922.547567] [<ffffffff8118510e>] __writeback_inodes_wb+0x9e/0xd0 >> [ 5922.547589] [<ffffffff811854db>] wb_writeback+0x39b/0x3d0 >> [ 5922.547610] [<ffffffff811855a8>] wb_check_old_data_flush+0x98/0xa0 >> [ 5922.547632] [<ffffffff811865fc>] wb_do_writeback+0x1ac/0x250 >> [ 5922.547653] [<ffffffff81071770>] ? init_timer_deferrable_key+0x30/0x30 >> [ 5922.547676] [<ffffffff8118672c>] bdi_writeback_thread+0x8c/0x340 >> [ 5922.547698] [<ffffffff811866a0>] ? wb_do_writeback+0x250/0x250 >> [ 5922.547720] [<ffffffff810846fc>] kthread+0x8c/0xa0 >> [ 5922.547738] [<ffffffff81408734>] kernel_thread_helper+0x4/0x10 >> [ 5922.547759] [<ffffffff81084670>] ? kthread_worker_fn+0x190/0x190 >> [ 5922.547780] [<ffffffff81408730>] ? gs_change+0x13/0x13 >> [ 5922.547797] Code: 89 65 e8 4c 89 6d f0 4c 89 75 f8 66 66 66 66 90 b8 fb ff ff >> ff 48 8b 1f 49 89 f4 f6 47 14 04 4c 8b 2b 75 72 41 f6 45 00 02 75 6b >> [ 5922.548006] RIP [<ffffffffa00ccac8>] jbd2_journal_file_inode+0x38/0xf0 >> [jbd2] >> [ 5922.548036] RSP <ffff88020fda5960> >> [ 5922.548048] CR2: 0000000000000000 >> [ 5922.607946] ---[ end trace 6f21f07839837a01 ]--- >> >> I learned how to make gdb give me the offending lines from the call stack. Let >> me know if this is sufficient information or if there's a better way to do this: >> >> [<ffffffffa00ccac8>] jbd2_journal_file_inode+0x38/0xf0 [jbd2] >> Line 2142 of "fs/jbd2/transaction.c" >> [<ffffffffa00ef35c>] mpage_da_map_and_submit+0x2bc/0x400 [ext4] >> Line 1478 of "fs/ext4/inode.c" >> [<ffffffffa00efd3b>] ? ext4_da_writepages+0x35b/0x6b0 [ext4] >> Line 2140 of "fs/ext4/inode.c" >> [<ffffffffa00efd86>] ext4_da_writepages+0x3a6/0x6b0 [ext4] >> Line 2160 of "fs/ext4/inode.c" >> [<ffffffff81108534>] do_writepages+0x24/0x40 >> Line 1264 of "mm/page-writeback.c" >> [ 5922.547523] [<ffffffff811848a3>] writeback_single_inode+0x183/0x490 >> Line 405 of "fs/fs-writeback.c" >> >> If you need my kernel config or other relevant details, I can provide them. I >> still have the source tree, config, and other bits and pieces that I used to >> build the kernel so I'm in a good position to help diagnose the problem. > Thanks for those details. So the problem was that jinode was NULL (i.e. > inode written back by ext4_da_writepages() didn't have > EXT4(inode)->i_jinode allocated). Allocation of jinode happens when a file is > opened for writing. So the question is, how an inode could get dirty data > without being opened for writing. Did you do something unusual when the bug > triggered? Something like defragmentation, inode migration, or other > things? I wasn't doing anything out of the ordinary, including defragging or any other sort of FS maintenance. I had, however, started an rsync operation which wrote to the ext4 filesystem in question just before the kernel seized up. I do keep a VirtualBox instance running in the background a majority of the time, and its hard drive file is on the same filesystem, so it might possibly have had something to do with it; however, based on the timing of the panic I would wager it had more to do with rsync. > > Honza > -- > Jan Kara <jack@xxxxxxx> > SUSE Labs, CR > Justin -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html