xfs corruption issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Dave,
My name is Danny Shavit and I am with Zadara storage.
We will appreciate your feedback reagrding an xfs_corruption and xfs_reapir issue.

We found a corrupted xfs volume in one of our systems. It is around 1 TB size and about 12 M files.
We run xfs_repair on the volume which succeeded after 42 minutes.
We noticed that memory consumption raised to about 7.5 GB.
Since some customers are using only 4GB (and sometimes even 2 GB) we tried running "xfs_repair -m 3200" on a 4GB RAM machine.
However, this time an OOM event happened during handling of AG 26 during step 3.
The log of xfs_repair is enclosed below.
We will appreciate your feedback on the amount of memory needed for xfs_repair in general and when using "-m" option specifically.
The xfs metadata dump (prior to xfs_repair) can be found here:
https://zadarastorage-public.s3.amazonaws.com/xfs/xfsdump-prod-ebs_2015-03-30_23-00-38.tgz
It is a 1.2 GB file (and 5.7 GB uncompressed).

We will appreciate your feedback on the corruption pattern as well.
--
Thank you,
Danny Shavit
Zadarastorage

---------- xfs_repair log  ----------------
root@vsa-00000428-vc-1:/export/4xfsdump# date; xfs_repair -v /dev/dm-55; date                                                               
Tue Mar 31 02:28:04 PDT 2015
Phase 1 - find and verify superblock...
        - block cache size set to 735288 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 1920 tail block 1920
        - scan filesystem freespace and inode maps...
agi_freecount 54, counted 55 in ag 7
sb_ifree 947, counted 948
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
         - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
bad . entry in directory inode 5691013154, was 5691013170: correcting
bad . entry in directory inode 5691013156, was 5691013172: correcting
bad . entry in directory inode 5691013157, was 5691013173: correcting
bad . entry in directory inode 5691013163, was 5691013179: correcting
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26   (Danny: OOM occurred here with -m 3200)
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
entry "SavedXML" in dir inode 2992927241 inconsistent with .. value (4324257659) in ino 5691013156
        will clear entry "SavedXML"
rebuilding directory inode 2992927241
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
entry "Out" in dir inode 4324257659 inconsistent with .. value (2992927241) in ino 5691013172
        will clear entry "Out"
rebuilding directory inode 4324257659
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
entry "tocs_file" in dir inode 5691012138 inconsistent with .. value (3520464676) in ino 5691013154
        will clear entry "tocs_file"
entry "trees.log" in dir inode 5691012138 inconsistent with .. value (3791956240) in ino 5691013155
        will clear entry "trees.log"
rebuilding directory inode 5691012138
entry "filelist.xml" in directory inode 5691012139 not consistent with .. value (1909707067) in inode 5691013157,
junking entry
fixing i8count in inode 5691012139
entry "image001.jpg" in directory inode 5691012140 not consistent with .. value (2450176033) in inode 5691013163,
junking entry
fixing i8count in inode 5691012140
entry "OCR" in dir inode 5691013154 inconsistent with .. value (5691013170) in ino 1909707065
        will clear entry "OCR"
entry "Tmp" in dir inode 5691013154 inconsistent with .. value (5691013170) in ino 2179087403
        will clear entry "Tmp"
entry "images" in dir inode 5691013154 inconsistent with .. value (5691013170) in ino 2450176007
        will clear entry "images"
rebuilding directory inode 5691013154
entry "286_Kellman_Hoffer_Master.pdf_files" in dir inode 5691013156 inconsistent with .. value (5691013172) in ino 834535727
        will clear entry "286_Kellman_Hoffer_Master.pdf_files"
rebuilding directory inode 5691013156
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected dir inode 834535727, moving to lost+found
disconnected dir inode 1909707065, moving to lost+found
disconnected dir inode 2179087403, moving to lost+found
disconnected dir inode 2450176007, moving to lost+found
disconnected dir inode 5691013154, moving to lost+found
disconnected dir inode 5691013155, moving to lost+found
disconnected dir inode 5691013156, moving to lost+found
disconnected dir inode 5691013157, moving to lost+found
disconnected dir inode 5691013163, moving to lost+found
disconnected dir inode 5691013172, moving to lost+found
Phase 7 - verify and correct link counts...
resetting inode 81777983 nlinks from 2 to 12
resetting inode 1909210410 nlinks from 1 to 2
resetting inode 1909707067 nlinks from 3 to 2
resetting inode 2450176033 nlinks from 18 to 17
resetting inode 2992927241 nlinks from 13 to 12
resetting inode 3520464676 nlinks from 13 to 12
resetting inode 3791956240 nlinks from 13 to 12
resetting inode 4324257659 nlinks from 13 to 12
resetting inode 5691013154 nlinks from 5 to 2
resetting inode 5691013156 nlinks from 3 to 2

        XFS_REPAIR Summary    Tue Mar 31 03:11:00 2015

Phase           Start           End             Duration
Phase 1:        03/31 02:28:04  03/31 02:28:05  1 second
Phase 2:        03/31 02:28:05  03/31 02:28:42  37 seconds
Phase 3:        03/31 02:28:42  03/31 02:48:29  19 minutes, 47 seconds
Phase 4:        03/31 02:48:29  03/31 02:55:40  7 minutes, 11 seconds
Phase 5:        03/31 02:55:40  03/31 02:55:43  3 seconds
Phase 6:        03/31 02:55:43  03/31 03:10:57  15 minutes, 14 seconds
Phase 7:        03/31 03:10:57  03/31 03:10:57

Total run time: 42 minutes, 53 seconds
done
Tue Mar 31 03:11:01 PDT 2015

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux