Weird XFS Corruption Error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everybody,

We experienced a weird XFS corruption yesterday and I desperately trying to find out what was happening.
First, the setup:

* ProLiant DL380p Gen8
* 256GB RAM
* HP SmartArray P420i Controller
** 1 GB BBWC
** Firmware Version 4.68
** 20x MK0100GCTYU 100GB SSD Drives
** Raid 1+0
* LVM
* Ubuntu 12.10 LTS
* Kernel 3.11.0-15-generic #23~precise1-Ubuntu

fstab Entry: 
/dev/vg00/opt_mysqlbackup   /opt/mysqlbackup            xfs     nobarrier,noatime,nodiratime,logbufs=8,logbsize=256k       0 2

We created a 120GB LV mounted on /opt/mysqlbackup with which (obviously) temporarily hosts our MariaDB Backups until they are transferred to tape. We use mylvmbackup (http://www.lenzg.net/mylvmbackup/) to create a (approx. 55GB) tar.gz file containing the dump. While testing, I created a hardlink for 2 Files in a subdir („safe“) and forgot them for a day while the „original“ file was deleted and replaced by next day’s backup.

When I tried cleaning up the no longer needed files, I encountered the following:

---------------------------------------------------------
me@hsoi-gts3-de02:/opt/mysqlbackup$ sudo rm -rf safe/
sudo rm -rf safe/
[sudo] password for saskani:
rm: cannot remove `safe/daily_snapshot.tar.gz.md5': Input/output error
---------------------------------------------------------

dmesg told me:
---------------------------------------------------------
[964199.138848] XFS (dm-8): Internal error xfs_bmbt_read_verify at line 789 of file /build/buildd/linux-lts-saucy-3.11.0/fs/xfs/xfs_bmap_btree.c.  Caller 0xffffffffa0164495
[964199.138848]
[964199.138850] CPU: 1 PID: 3694 Comm: kworker/1:1H Tainted: GF            3.11.0-15-generic #23~precise1-Ubuntu
[964199.138851] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013
[964199.138874] Workqueue: xfslogd xfs_buf_iodone_work [xfs]
[964199.138876]  0000000000000001 ffff881c6be6fd18 ffffffff8173bc0e 0000000000004364
[964199.138878]  ffff883f9061c000 ffff881c6be6fd38 ffffffffa016629f ffffffffa0164495
[964199.138879]  0000000000000001 ffff881c6be6fd78 ffffffffa016630e ffff881c6be6fda8
[964199.138880] Call Trace:
[964199.138886]  [<ffffffff8173bc0e>] dump_stack+0x46/0x58
[964199.138906]  [<ffffffffa016629f>] xfs_error_report+0x3f/0x50 [xfs]
[964199.138913]  [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138921]  [<ffffffffa016630e>] xfs_corruption_error+0x5e/0x90 [xfs]
[964199.138928]  [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138939]  [<ffffffffa01944d6>] xfs_bmbt_read_verify+0x76/0xf0 [xfs]
[964199.138946]  [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138949]  [<ffffffff81095bb2>] ? finish_task_switch+0x52/0xf0
[964199.138969]  [<ffffffffa0164495>] xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138972]  [<ffffffff81081060>] process_one_work+0x170/0x4a0
[964199.138973]  [<ffffffff81082121>] worker_thread+0x121/0x390
[964199.138975]  [<ffffffff81082000>] ? manage_workers.isra.21+0x170/0x170
[964199.138977]  [<ffffffff81088fe0>] kthread+0xc0/0xd0
[964199.138979]  [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
[964199.138981]  [<ffffffff817508ac>] ret_from_fork+0x7c/0xb0
[964199.138983]  [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
[964199.138984] XFS (dm-8): Corruption detected. Unmount and run xfs_repair
[964199.139014] XFS (dm-8): metadata I/O error: block 0x1f0 ("xfs_trans_read_buf_map") error 117 numblks 8
[964199.139016] XFS (dm-8): xfs_do_force_shutdown(0x1) called from line 367 of file /build/buildd/linux-lts-saucy-3.11.0/fs/xfs/xfs_trans_buf.c.  Return address = 0xffffffffa01cadbc
[964199.139324] XFS (dm-8): I/O Error Detected. Shutting down filesystem
[964199.139325] XFS (dm-8): Please umount the filesystem and rectify the problem(s)
[964212.367300] XFS (dm-8): xfs_log_force: error 5 returned.
[964242.477283] XFS (dm-8): xfs_log_force: error 5 returned.
---------------------------------------------------------

After that, I tried the following (in order):

1. xfs_repair, which did not find the superblock and started scanning the LV, after finding the secondary superblock, it told me there’s still something in the log, so I
2. mounted the filesystem, which gave me a „Structure needs cleaning“ after a couple of seconds
3. tried mounting again for good measure, same error „Structure needs cleaning“
4. xfs_repair -L which repaired everything, and effectively cleaned my Filesystem in the process.
5. mount the filesystem to find it empty.



Since then, I’m desperately trying to reproduce the problem, but unfortunately to no avail. Can somebody give some insight on the errors I encountered. I have previously operated 4,5PB worth of XFS Filesystems for 3 years and never got an error similar to this.

Best regards
Sascha

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux