On Tue, Oct 11, 2011 at 11:17:57AM +0200, Anders Ossowicki wrote: > We seem to have hit a bug on our brand-new disk with an XFS filesystem on the > 2.6.38.8 kernel. The disk is 2 Dell MD1220 enclosures with Intel SSDs daisy > chained behind an LSI MegaRAID SAS 9285-8e raid controller. It was under heavy > I/O load, 1-200 MB/s r/w from postgres for about a week before the bug showed > up. The system itself is a Dell PowerEdge R815 with 32 cpu cores and 256G > memory. > > Support for the 9285-8e controller was introduced as part of a series of > patches for drivers/scsi/megaraid in 2.6.38 (0d49016b..cd50ba8e). Given that > the megaraid driver support for the 9285-8e controller is so new it might be > the real source of the issue, but this is pure speculation on my part. Any > suggestions would be most welcome. > > The full dmesg is available at > http://dev.exherbo.org/~arkanoid/kat-dmesg-2011-10.txt > > BUG: unable to handle kernel paging request at 000000000040403c > IP: [<ffffffff810f8d71>] find_get_pages+0x61/0x110 > PGD 0 > Oops: 0000 [#1] SMP > last sysfs file: /sys/devices/system/cpu/cpu31/cache/index2/shared_cpu_map > CPU 11 > Modules linked in: btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs > minix ntfs vfat msdos fat jfs xfs reiserfs nfsd exportfs nfs lockd nfs_acl > auth_rpcgss sunrpc autofs4 psmouse serio_raw joydev ixgbe lp amd64_edac_mod > i2c_piix4 dca parport edac_core bnx2 power_meter dcdbas mdio edac_mce_amd ses > enclosure usbhid hid ahci mpt2sas libahci scsi_transport_sas megaraid_sas > raid_class > > Pid: 27512, comm: flush-8:32 Tainted: G W 2.6.38.8 #1 Dell Inc. > PowerEdge R815/04Y8PT > RIP: 0010:[<ffffffff810f8d71>] [<ffffffff810f8d71>] find_get_pages+0x61/0x110 This is core VM code, and operates purely on on-stack variables except for the page cache radix tree nodes / pages. So this either could be a core VM bug that no one has noticed yet, or memory corruption. Can you run memtest86 on the box? > RSP: 0018:ffff881fdee55800 EFLAGS: 00010246 > RAX: ffff8814a66d7000 RBX: ffff881fdee558c0 RCX: 000000000000000e > RDX: 0000000000000005 RSI: 0000000000000001 RDI: 0000000000404034 > RBP: ffff881fdee55850 R08: 0000000000000001 R09: 0000000000000002 > R10: ffffea00a0ff7788 R11: ffff88129306ac88 R12: 0000000000031535 > R13: 000000000000000e R14: ffff881fdee558e8 R15: 0000000000000005 > FS: 00007fec9ce13720(0000) GS:ffff88181fc80000(0000) knlGS:00000000f744d6d0 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000040403c CR3: 0000000001a03000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process flush-8:32 (pid: 27512, threadinfo ffff881fdee54000, task ffff881fdf4adb80) > Stack: > 0000000000000000 0000000000000000 0000000000000000 ffff8832e7edf6e0 > 0000000000000000 ffff881fdee558b0 ffffea008b443c18 0000000000031535 > ffff8832e7edf590 ffff881fdee55d20 ffff881fdee55870 ffffffff81101f92 > Call Trace: > [<ffffffff81101f92>] pagevec_lookup+0x22/0x30 > [<ffffffffa033e00d>] xfs_cluster_write+0xad/0x180 [xfs] > [<ffffffffa033e4f4>] xfs_vm_writepage+0x414/0x4f0 [xfs] > [<ffffffff810ffb77>] __writepage+0x17/0x40 > [<ffffffff81100d95>] write_cache_pages+0x1c5/0x4a0 > [<ffffffff810ffb60>] ? __writepage+0x0/0x40 > [<ffffffff81101094>] generic_writepages+0x24/0x30 > [<ffffffffa033d5dd>] xfs_vm_writepages+0x5d/0x80 [xfs] > [<ffffffff811010c1>] do_writepages+0x21/0x40 > [<ffffffff811730bf>] writeback_single_inode+0x9f/0x250 > [<ffffffff8117370b>] writeback_sb_inodes+0xcb/0x170 > [<ffffffff81174174>] writeback_inodes_wb+0xa4/0x170 > [<ffffffff8117450b>] wb_writeback+0x2cb/0x440 > [<ffffffff81035bb9>] ? default_spin_lock_flags+0x9/0x10 > [<ffffffff8158b3af>] ? _raw_spin_lock_irqsave+0x2f/0x40 > [<ffffffff811748ac>] wb_do_writeback+0x22c/0x280 > [<ffffffff811749aa>] bdi_writeback_thread+0xaa/0x260 > [<ffffffff81174900>] ? bdi_writeback_thread+0x0/0x260 > [<ffffffff81081b76>] kthread+0x96/0xa0 > [<ffffffff8100cda4>] kernel_thread_helper+0x4/0x10 > [<ffffffff81081ae0>] ? kthread+0x0/0xa0 > [<ffffffff8100cda0>] ? kernel_thread_helper+0x0/0x10 > Code: 4e 1c 00 85 c0 89 c1 0f 84 a7 00 00 00 49 89 de 45 31 ff 31 d2 0f 1f 44 > 00 00 49 8b 06 48 8b 38 48 85 ff 74 3d 40 f6 c7 01 75 54 <44> 8b 47 08 4c 8d 57 > 08 45 85 c0 74 e5 45 8d 48 01 44 89 c0 f0 > RIP [<ffffffff810f8d71>] find_get_pages+0x61/0x110 > RSP <ffff881fdee55800> > CR2: 000000000040403c > ---[ end trace 84193c2a431ae14b ]--- _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs