Is kernel 3.6.1 or filestreams option toxic ?

Yann Dupont <Yann.Dupont@xxxxxxxxxxxxxx> · Mon, 22 Oct 2012 16:14:07 +0200

Hello,
Last week, I encountered problems with xfs volumes on several machines. 
Kernel hanged under heavy load, I hard to hard reset. After reboot, xfs 
volume was not able to mount, and xfs_repair didn't managed to recover 
the volume cleanly on 2 different machines.

Just to relax things, It wasn't production data, so it don't matter if I 
recover data or not. But more important to me is to understand why 
things went wrong...

I'm using XFS since a long time, on lots of data, it's the first time I 
encounter such a problem, but I was using unusual option : filestreams, 
and was using kernel 3.6.1. So I wonder if it has something to do with 
the crash.

I have nothing very conclusive in the kernel logs, apart this :

Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569890] 
INFO: task ceph-osd:17856 blocked for more than 120 seconds.
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569941] 
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569987] 
ceph-osd        D ffff88056416b1a0     0 17856      1 0x00000000
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.569993] 
ffff88056416aed0 0000000000000086 ffff880590751fd8 ffff88000c67eb00
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570047] 
ffff880590751fd8 ffff880590751fd8 ffff880590751fd8 ffff88056416aed0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570101] 
0000000000000001 ffff88056416aed0 ffff880a15240d00 ffff880a15240d60
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570156] Call 
Trace:
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570187] 
[<ffffffff81041335>] ? exit_mm+0x85/0x120
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570216] 
[<ffffffff81042a94>] ? do_exit+0x154/0x8e0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570248] 
[<ffffffff8114ec79>] ? file_update_time+0xa9/0x100
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570278] 
[<ffffffff81043568>] ? do_group_exit+0x38/0xa0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570309] 
[<ffffffff81051bc6>] ? get_signal_to_deliver+0x1a6/0x5e0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570341] 
[<ffffffff8100223e>] ? do_signal+0x4e/0x970
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570371] 
[<ffffffff81170e2e>] ? fsnotify+0x24e/0x340
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570402] 
[<ffffffff8100c995>] ? fpu_finit+0x15/0x30
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570431] 
[<ffffffff8100db34>] ? restore_i387_xstate+0x64/0x1c0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570464] 
[<ffffffff8108e0d2>] ? sys_futex+0x92/0x1b0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570493] 
[<ffffffff81002bf5>] ? do_notify_resume+0x75/0xc0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570525] 
[<ffffffff813c60fa>] ? int_signal+0x12/0x17
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570553] 
INFO: task ceph-osd:17857 blocked for more than 120 seconds.
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570583] 
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570628] 
ceph-osd        D ffff8801161fe720     0 17857      1 0x00000000
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570632] 
ffff8801161fe450 0000000000000086 ffffffffffffffe0 ffff880a17c73c30
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570687] 
ffff88011347ffd8 ffff88011347ffd8 ffff88011347ffd8 ffff8801161fe450
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570740] 
ffff8801161fe450 ffff8801161fe450 ffff880a15240d00 ffff880a15240d60
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570794] Call 
Trace:
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570818] 
[<ffffffff81041335>] ? exit_mm+0x85/0x120
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570846] 
[<ffffffff81042a94>] ? do_exit+0x154/0x8e0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570875] 
[<ffffffff81043568>] ? do_group_exit+0x38/0xa0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570905] 
[<ffffffff81051bc6>] ? get_signal_to_deliver+0x1a6/0x5e0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570935] 
[<ffffffff8100223e>] ? do_signal+0x4e/0x970
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570967] 
[<ffffffff81302d24>] ? sys_sendto+0x114/0x150
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.570996] 
[<ffffffff8108e0d2>] ? sys_futex+0x92/0x1b0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571024] 
[<ffffffff81002bf5>] ? do_notify_resume+0x75/0xc0
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571054] 
[<ffffffff813c60fa>] ? int_signal+0x12/0x17
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571082] 
INFO: task ceph-osd:17858 blocked for more than 120 seconds.
Oct 14 14:37:21 hanyu.u14.univ-nantes.prive kernel: [532905.571111] 
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Wasn't able to cleanly shutdown the servers after that. On 2 machines, 
xfs volumes (12 TB each) couldn't be mounted anymore, after hardreset, 
needed xfs_repair -L ...

On 1 machine, xfs_repair goes to end, but with millions errors, and this 
gives this in the end :(
344010712    /XCEPH-PROD/data/osd.8
6841649480    /XCEPH-PROD/data/lost+found/

I understand xfs_repair -L always lead to data loss, but not to that point ?

on the other one, xfs_repairs segfaults, after lots of messages like 
that (I mean, really lots):

block (0,1008194-1008194) multiply claimed by cnt space tree, state - 2
block (0,1008200-1008200) multiply claimed by cnt space tree, state - 2
block (0,1012323-1012323) multiply claimed by cnt space tree, state - 2
...

agf_freeblks 87066179, counted 87066033 in ag 0
agi_freecount 489403, counted 488952 in ag 0
agi unlinked bucket 1 is 7681 in ag 0 (inode=7681)
agi unlinked bucket 5 is 67781 in ag 0 (inode=67781)
agi unlinked bucket 6 is 10950 in ag 0 (inode=10950)
...

block (3,30847085-30847085) multiply claimed by cnt space tree, state - 2
block (3,27384823-27384823) multiply claimed by cnt space tree, state - 2
block (3,30115747-30115747) multiply claimed by cnt space tree, state - 2
...
agf_freeblks 90336213, counted 302201427 in ag 3
agf_longest 6144, counted 167772160 in ag 3
inode chunk claims used block, inobt block - agno 3, bno 2380, inopb 16
inode chunk claims used block, inobt block - agno 3, bno 280918, inopb 16
...

Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
found inodes not in the inode allocation tree
        - process known inodes and perform inode discovery...
        - agno = 0
7f1738c17700: Badness in key lookup (length)
bp=(bno 2848, len 16384 bytes) key=(bno 2848, len 8192 bytes)
7f1738c17700: Badness in key lookup (length)
bp=(bno 3840, len 16384 bytes) key=(bno 3840, len 8192 bytes)
7f1738c17700: Badness in key lookup (length)
bp=(bno 5456, len 16384 bytes) key=(bno 5456, len 8192 bytes)
...
and in the end, xfs_repair segfaults.

Those machines are part of a 12 machine ceph cluster (Ceph itself is 
pure user-space). All nodes are independant (not on the same computer 
room), but were all running 3.6.1 since some days, and all were using 
xfs with filestreams option (I was trying to prevent xfs fragmentation). 
Could it be related , as it's the first time I encounter such a 
disastrous data loss ?

I don't have much more relevant details, making this mail a poor bug 
report ...

If that matters, I can anyway furnish more details about the way those 
kernels hanged (ceph nodes reweights, stressing the hardware, lots of 
I/O), details about servers & fibre channels disks, and so on.

Cheers,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@xxxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs