Re: xfs corruption

"Alex Lyakas" <alex@xxxxxxxxxxxxxxxxx> · Sun, 6 Sep 2015 12:19:49 +0200

Hi Eric,
Thank you for your comments.

Yes, we made the ACL limit change, being fully aware that this breaks 
compatibility with the mainline kernel and future mainline kernels. We mount 
our XFS filesystems with our kernel only. We are also aware that this change 
needs to be carefully forward-ported, when we move to a newer kernel.

I have an additional question regarding the latest XFS corruption report:
kernel: [3507105.314446] Pid: 25231, comm: kworker/0:0H Tainted: GF       W 
O 3.8.13-030813-generic #201305111843
kernel: [3507105.314449] Call Trace:
kernel: [3507105.314487]  [<ffffffffa0631baf>] xfs_error_report+0x3f/0x50 
[xfs]
kernel: [3507105.314502]  [<ffffffffa064e9ce>] ? 
xfs_allocbt_read_verify+0xe/0x10 [xfs]
kernel: [3507105.314514]  [<ffffffffa0631c1e>] 
xfs_corruption_error+0x5e/0x90 [xfs]
kernel: [3507105.314528]  [<ffffffffa064e862>] xfs_allocbt_verify+0x92/0x1e0 
[xfs]
kernel: [3507105.314540]  [<ffffffffa064e9ce>] ? 
xfs_allocbt_read_verify+0xe/0x10 [xfs]
kernel: [3507105.314547]  [<ffffffff810135aa>] ? __switch_to+0x12a/0x4a0
kernel: [3507105.314551]  [<ffffffff81096cd8>] ? set_next_entity+0xa8/0xc0
kernel: [3507105.314566]  [<ffffffffa064e9ce>] 
xfs_allocbt_read_verify+0xe/0x10 [xfs]
kernel: [3507105.315251]  [<ffffffffa062f48f>] xfs_buf_iodone_work+0x3f/0xa0 
[xfs]
kernel: [3507105.315255]  [<ffffffff81078b81>] process_one_work+0x141/0x490
kernel: [3507105.315257]  [<ffffffff81079b48>] worker_thread+0x168/0x400
kernel: [3507105.315259]  [<ffffffff810799e0>] ? manage_workers+0x120/0x120
kernel: [3507105.315262]  [<ffffffff8107f050>] kthread+0xc0/0xd0
kernel: [3507105.315265]  [<ffffffff8107ef90>] ? 
flush_kthread_worker+0xb0/0xb0
kernel: [3507105.315270]  [<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
kernel: [3507105.315273]  [<ffffffff8107ef90>] ? 
flush_kthread_worker+0xb0/0xb0
kernel: [3507105.315275] XFS (dm-39): Corruption detected. Unmount and run 
xfs_repair
kernel: [3507105.316706] XFS (dm-39): metadata I/O error: block 0x41a6eff8 
("xfs_trans_read_buf_map") error 117 numblks 8

From looking at XFS code, it appears that XFS read metadata block from disk, 
and discovered that it was corrupted. At this point, the system was 
rebooted, and after reboot we prevented this particular XFS from mounting. 
Then we ran xfs-metadump and xfs-repair. The latter found absolutely no 
issues, and XFS was able to successfully mount and continue operation.

Can you think of a way to explain this?
Can you confirm that the above trace really means that XFS was reading its 
metadata from disk?
From XFS code, I see that  XFS does not use Linux page cache for its 
metadata (unlike btrfs, for example). Is my understanding correct? 
(Otherwise, I could assume that somebody wrongly touched a page in the 
page-cache and messed up its in-memory content).

Thanks,
Alex.

-----Original Message----- 
From: Eric Sandeen
Sent: 03 September, 2015 6:14 PM
To: Danny Shavit
Cc: Alex Lyakas ; xfs@xxxxxxxxxxx
Subject: Re: xfs corruption

On 9/3/15 9:55 AM, Eric Sandeen wrote:
On 9/3/15 9:26 AM, Danny Shavit wrote:

...

We are using modified xfs. Mainly, added some reporting features and
changed discard operation to be aligned with chunk sizes used in our
systems. The modified code resides at  https://github.com/zadarastora
<https://github.com/zadarastorage/zadara-xfs-pushback>ge/zadara-xfs-pushback
<https://github.com/zadarastorage/zadara-xfs-pushback>.

Interesting, thanks for the pointer.  I guess at this point I have to
ask, do you see these same problems without your modifications?

Have you ever mounted this filesystem on non-zadara kernels?

looking at
https://github.com/zadarastorage/zadara-xfs-pushback/commit/094df949fd080ede546bb7518405ab873a444823

you've changed the disk format w/o adding a feature flag,
which is pretty dangerous.

-Eric 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs