Re: Corrupted files

Sean Caron <scaron@xxxxxxxxx> · Tue, 9 Sep 2014 21:25:37 -0400

Hi Leslie,
You really don't want to be running "green" anything in an array... that is a ticking time bomb just waiting to go off... let me tell you... At my installation, a predecessor had procured a large number of green drives because they were very inexpensive and regrets were had by all. Lousy performance, lots of spurious ejection/RAID gremlins and the failure rate on the WDC Greens is just appalling...

BBWC stands for Battery Backed Write Cache; this is a feature of hardware RAID cards; it is just like it says on the tin; a bit (usually half a gig, or a gig, or two...) of nonvolatile cache that retains writes to the array in case of power failure, etc. If you have BBWC enabled but your battery is dead, bad things can happen. Not applicable for JBOD software RAID.

I hold firm to my beliefs on xfs_repair :) As I say, you'll see a variety of opinions here. 

Best,

Sean

On Tue, Sep 9, 2014 at 9:12 PM, Leslie Rhorer <lrhorer@xxxxxxxxxxxx> wrote:
On 9/9/2014 5:06 PM, Dave Chinner wrote:

Fristly, more infomration is required, namely versions and actual

error messages:

        Indubitably:

RAID-Server:/# xfs_repair -V

xfs_repair version 3.1.7

RAID-Server:/# uname -r

3.2.0-4-amd64

4.0 GHz FX-8350 eight core processor

RAID-Server:/# cat /proc/meminfo /proc/mounts /proc/partitions

MemTotal:        8099916 kB

MemFree:         5786420 kB

Buffers:          112684 kB

Cached:           457020 kB

SwapCached:            0 kB

Active:           521800 kB

Inactive:         457268 kB

Active(anon):     276648 kB

Inactive(anon):   140180 kB

Active(file):     245152 kB

Inactive(file):   317088 kB

Unevictable:           0 kB

Mlocked:               0 kB

SwapTotal:      12623740 kB

SwapFree:       12623740 kB

Dirty:                20 kB

Writeback:             0 kB

AnonPages:        409488 kB

Mapped:            47576 kB

Shmem:              7464 kB

Slab:             197100 kB

SReclaimable:     112644 kB

SUnreclaim:        84456 kB

KernelStack:        2560 kB

PageTables:         8468 kB

NFS_Unstable:          0 kB

Bounce:                0 kB

WritebackTmp:          0 kB

CommitLimit:    16673696 kB

Committed_AS:    1010172 kB

VmallocTotal:   34359738367 kB

VmallocUsed:      339140 kB

VmallocChunk:   34359395308 kB

HardwareCorrupted:     0 kB

AnonHugePages:         0 kB

HugePages_Total:       0

HugePages_Free:        0

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

DirectMap4k:       65532 kB

DirectMap2M:     5120000 kB

DirectMap1G:     3145728 kB

rootfs / rootfs rw 0 0

sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0

proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0

udev /dev devtmpfs rw,relatime,size=10240k,nr_inodes=1002653,mode=755 0 0

devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0

tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=809992k,mode=755 0 0

/dev/disk/by-uuid/fa5c404a-bfcb-43de-87ed-e671fda1ba99 / ext4 rw,relatime,errors=remount-ro,user_xattr,barrier=1,data="">ordered 0 0

tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0

tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=4144720k 0 0

/dev/md1 /boot ext2 rw,relatime,errors=continue 0 0

rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0

Backup:/Backup /Backup nfs rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.51,mountvers=3,mountport=39597,mountproto=tcp,local_lock=none,addr=192.168.1.51 0 0

Backup:/var/www /var/www/backup nfs rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.51,mountvers=3,mountport=39597,mountproto=tcp,local_lock=none,addr=192.168.1.51 0 0

/dev/md0 /RAID xfs rw,relatime,attr2,delaylog,sunit=2048,swidth=12288,noquota 0 0

major minor  #blocks  name

   8        0  125034840 sda

   8        1      96256 sda1

   8        2  112305152 sda2

   8        3   12632064 sda3

   8       16  125034840 sdb

   8       17      96256 sdb1

   8       18  112305152 sdb2

   8       19   12632064 sdb3

   8       48 3907018584 sdd

   8       32 3907018584 sdc

   8       64 1465138584 sde

   8       80 1465138584 sdf

   8       96 1465138584 sdg

   8      112 3907018584 sdh

   8      128 3907018584 sdi

   8      144 3907018584 sdj

   8      160 3907018584 sdk

   9        1      96192 md1

   9        2  112239488 md2

   9        3   12623744 md3

   9        0 23441319936 md0

   9       10 4395021312 md10

RAID-Server:/# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [raid1] [raid0]

md10 : active raid0 sdf[0] sde[2] sdg[1]

      4395021312 blocks super 1.2 512k chunks

md0 : active raid6 md10[12] sdc[13] sdk[10] sdj[11] sdi[15] sdh[8] sdd[9]

      23441319936 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [8/7] [UUU_UUUU]

      bitmap: 29/30 pages [116KB], 65536KB chunk

md3 : active (auto-read-only) raid1 sda3[0] sdb3[1]

      12623744 blocks super 1.2 [3/2] [UU_]

      bitmap: 1/1 pages [4KB], 65536KB chunk

md2 : active raid1 sda2[0] sdb2[1]

      112239488 blocks super 1.2 [3/2] [UU_]

      bitmap: 1/1 pages [4KB], 65536KB chunk

md1 : active raid1 sda1[0] sdb1[1]

      96192 blocks [3/2] [UU_]

      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

        Six of the drives are 4T spindles (a mixture of makes and models).  The three drives comprising MD10 are WD 1.5T green drives.  These are in place to take over the function of one of the kicked 4T drives.  Md1, 2, and 3 are not data drives and are not suffering any issue.

        I'm not sure what is meant by "write cache status" in this context. The machine has been rebooted more than once during recovery and the FS has been umounted and xfs_repair run several times.

        I don't know for what the acronym BBWC stands.

RAID-Server:/# xfs_info /dev/md0

meta-data=""              isize=256    agcount=43, agsize=137356288 blks

         =                       sectsz=512   attr=2

data     =                       bsize=4096   blocks=5860329984, imaxpct=5

         =                       sunit=256    swidth=1536 blks

naming   =version 2              bsize=4096   ascii-ci=0

log      =internal               bsize=4096   blocks=521728, version=2

         =                       sectsz=512   sunit=8 blks, lazy-count=1

realtime =none                   extsz=4096   blocks=0, rtextents=0

        The system performs just fine, other than the aforementioned, with loads in excess of 3Gbps.  That is internal only.  The LAN link is ony 1Gbps, so no external request exceeds about 950Mbps.

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

dmesg, in particular, should tell use what the corruption being

encountered is when stat fails.

RAID-Server:/# ls "/RAID/DVD/Big Sleep, The (1945)/VIDEO_TS/VTS_01_1.VOB"

ls: cannot access /RAID/DVD/Big Sleep, The (1945)/VIDEO_TS/VTS_01_1.VOB: Structure needs cleaning

RAID-Server:/# dmesg | tail -n 30

...

[192173.363981] XFS (md0): corrupt dinode 41006, extent total = 1, nblocks = 0.

[192173.363988] ffff8802338b8e00: 49 4e 81 b6 02 02 00 00 00 00 03 e8 00 00 03 e8  IN..............

[192173.363996] XFS (md0): Internal error xfs_iformat(1) at line 319 of file /build/linux-eKuxrT/linux-3.2.60/fs/xfs/xfs_inode.c.  Caller 0xffffffffa0509318

[192173.363999]

[192173.364062] Pid: 10813, comm: ls Not tainted 3.2.0-4-amd64 #1 Debian 3.2.60-1+deb7u3

[192173.364065] Call Trace:

[192173.364097]  [<ffffffffa04d3731>] ? xfs_corruption_error+0x54/0x6f [xfs]

[192173.364134]  [<ffffffffa0509318>] ? xfs_iread+0x9f/0x177 [xfs]

[192173.364170]  [<ffffffffa0508efa>] ? xfs_iformat+0xe3/0x462 [xfs]

[192173.364204]  [<ffffffffa0509318>] ? xfs_iread+0x9f/0x177 [xfs]

[192173.364240]  [<ffffffffa0509318>] ? xfs_iread+0x9f/0x177 [xfs]

[192173.364268]  [<ffffffffa04d6ebe>] ? xfs_iget+0x37c/0x56c [xfs]

[192173.364300]  [<ffffffffa04e13b4>] ? xfs_lookup+0xa4/0xd3 [xfs]

[192173.364328]  [<ffffffffa04d9e5a>] ? xfs_vn_lookup+0x3f/0x7e [xfs]

[192173.364344]  [<ffffffff81102de9>] ? d_alloc_and_lookup+0x3a/0x60

[192173.364357]  [<ffffffff8110388d>] ? walk_component+0x219/0x406

[192173.364370]  [<ffffffff81104721>] ? path_lookupat+0x7c/0x2bd

[192173.364383]  [<ffffffff81036628>] ? should_resched+0x5/0x23

[192173.364396]  [<ffffffff8134f144>] ? _cond_resched+0x7/0x1c

[192173.364408]  [<ffffffff8110497e>] ? do_path_lookup+0x1c/0x87

[192173.364420]  [<ffffffff81106407>] ? user_path_at_empty+0x47/0x7b

[192173.364434]  [<ffffffff813533d8>] ? do_page_fault+0x30a/0x345

[192173.364448]  [<ffffffff810d6a04>] ? mmap_region+0x353/0x44a

[192173.364460]  [<ffffffff810fe45a>] ? vfs_fstatat+0x32/0x60

[192173.364471]  [<ffffffff810fe590>] ? sys_newstat+0x12/0x2b

[192173.364483]  [<ffffffff813509f5>] ? page_fault+0x25/0x30

[192173.364495]  [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b

[192173.364503] XFS (md0): Corruption detected. Unmount and run xfs_repair

        That last line, by the way, is why I ran umount and xfs_repair.

_______________________________________________

xfs mailing list

xfs@xxxxxxxxxxx

http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs