On 9/9/2014 8:53 PM, Dave Chinner wrote:
On Tue, Sep 09, 2014 at 08:12:38PM -0500, Leslie Rhorer wrote:
On 9/9/2014 5:06 PM, Dave Chinner wrote:
Fristly, more infomration is required, namely versions and actual
error messages:
Indubitably:
RAID-Server:/# xfs_repair -V
xfs_repair version 3.1.7
RAID-Server:/# uname -r
3.2.0-4-amd64
Ok, so a relatively old xfs_repair. That's important - read on....
OK, a good reason is a good reason.
4.0 GHz FX-8350 eight core processor
RAID-Server:/# cat /proc/meminfo /proc/mounts /proc/partitions
MemTotal: 8099916 kB
....
/dev/md0 /RAID xfs
rw,relatime,attr2,delaylog,sunit=2048,swidth=12288,noquota 0 0
FWIW, you don't need sunit=2048,swidth=12288 in the mount options -
they are stored on disk and the mount options are only necessray to
change the on-disk values.
They aren't. Those were created automatically, weather at creation
time or at mount time, I don't know, but the filesystem was created with
mkfs.xfs /dev/md0
and fstab contains:
/dev/md0 /RAID xfs rw 1 2
Six of the drives are 4T spindles (a mixture of makes and models).
The three drives comprising MD10 are WD 1.5T green drives. These
are in place to take over the function of one of the kicked 4T
drives. Md1, 2, and 3 are not data drives and are not suffering any
issue.
Ok, that's creative. But when you need another drive in the array
and you don't have the right spares.... ;)
Yes, but I wasn't really expecting to need 3 spares this soon or
suddenly. These are fairly new drives, and with 33% of the array being
parity, the sudden need for 3 extra drives just is not too likely.
That, plus I have quite a few 1.5 and 1.0T drives lying around in case
of sudden emergency. This isn't the first time I've replaced a single
drive temporarily with a RAID0. The performance is actually better, of
course, and for the 3 or 4 days it takes to get a new drive, it's really
not an issue. Since I have a full online backup system plus a regularly
updated off-site backup, the risk is quite minimal. This is an exercise
in mild inconvenience, not an emergency failure. If this were a
commercial system, it would be another matter, but I know for a fact
there are a very large number of home NAS solutions in place that are
less robust than this one. I personally know quite a few people who
never do backups, at all.
I'm not sure what is meant by "write cache status" in this context.
The machine has been rebooted more than once during recovery and the
FS has been umounted and xfs_repair run several times.
Start here and read the next few entries:
http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F
I knew that, but I still don't see the relevance in this context.
There is no battery backup on the drive controller or the drives, and
the drives have all been powered down and back up several times.
Anything in any cache right now would be from some operation in the last
few minutes, not four days ago.
I don't know for what the acronym BBWC stands.
"battery backed write cache". If you're not using a hardware RAID
controller, it's unlikely you have one.
See my previous. I do have one (a 3Ware 9650E, given to me by a friend
when his company switched to zfs for their server). It's not on this
system. This array is on a HighPoint RocketRAID 2722.
The difference between a
drive write cache and a BBWC is that the BBWC is non-volatile - it
does not get lost when power drops.
Yeah, I'm aware, thanks. I just didn't cotton to the acronym.
RAID-Server:/# xfs_info /dev/md0
meta-data=/dev/md0 isize=256 agcount=43,
agsize=137356288 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=5860329984, imaxpct=5
= sunit=256 swidth=1536 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Ok, that all looks pretty good, and the sunit/swidth match the mount
options you set so you definitely don't need the mount options...
Yeah, I didn't set them. What did, I don't really know for certain.
See above.
[192173.364460] [<ffffffff810fe45a>] ? vfs_fstatat+0x32/0x60
[192173.364471] [<ffffffff810fe590>] ? sys_newstat+0x12/0x2b
[192173.364483] [<ffffffff813509f5>] ? page_fault+0x25/0x30
[192173.364495] [<ffffffff81355452>] ? system_call_fastpath+0x16/0x1b
[192173.364503] XFS (md0): Corruption detected. Unmount and run xfs_repair
That last line, by the way, is why I ran umount and xfs_repair.
Right, that's the correct thing to do, but sometimes there are
issues that repair doesn't handle properly. This *was* one of them,
and it was fixed by commit e1f43b4 ("repair: update extent count
after zapping duplicate blocks") which was added to xfs_repair
v3.1.8.
IOWs, upgrading xfsprogs to the latest release and re-running
xfs_repair should fix this error.
OK. I'll scarf the source and compile. All I need is to git clone
git://oss.sgi.com/xfs/xfs and git://oss.sgi.com/xfs/cmds/xfsprogs, right?
I've never used git on a package maintained in my distro. Will I have
issues when I upgrade to Debian Jessie in a few months, since this is
not being managed by apt / dpkg? It looks like Jessie has 3.2.1 of
xfs-progs.
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs