Re: ll_ver_fs data verification failure - 96TB fs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Valerie Aurora <vaurora@xxxxxxxxxx> wrote:

> On Mon, Aug 03, 2009 at 09:54:36AM -0400, Nick Dokos wrote:
> > Just a heads-up for now. I ran ll_ver_fs on a 96TB fs - the write phase
> > finished without problems, but the read phase encountered a problem:
> > 
> > ...
> > read File name: /mnt/dir00373/file026
> > 
> > liverfs: verify /mnt/dir00373/file026 failed offset/timestamp/inode 3244298240/1248819541/1096796: found 3243249664/1248819541/1096796 instead
> > 
> > liverfs: Data verification failed
> > 770.45user 218639.65system 67:38:18elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
> > 100357573552inputs+195522668184outputs (1major+414minor)pagefaults 0swaps
> > make: *** [llver] Error 2
> > 
> > 
> > The offset difference is exactly 1M, and it occurs about 3GB into the file.
> 
> Interesting - exactly 1M off.  Does this correspond to anything
> interesting in extent layout or block allocation boundaries?
> 
> Any chance you can patch ll_ver_fs to continue after the first error?
> I'd be happy to write the patch for you.

I did that to begin with but the problem turns out to be much more
mundane: there was an IO error on one of the volumes. It wasn't quite
obvious (no red lights going off) but there *was* a message in
/var/log/messages - unfortunately I missed it. I eventually recreated
the error by trying to read the file with ``od -c'' and then went back
and found the original error. I don't know why/how ll_ver_fs managed to
read the offset and come up with a 1M difference[1] -- ``od -c'' failed with
a big thud.

We have now replaced the disk and I'm doing the test again: it should be
done (barring further problems) by sometime next week.

> 
> > In total, there are 726 directories, each with 32 4GB files (except the last,
> > which only has 12 files). So directory 373 is roughly half-way. I'll take a look
> > at the block allocation of both the directory and the file and see if they are
> > straddling the 16TB boundary (or other such).
> 
> Did you have a chance to look at what falls before and after the 16TB
> boundary?
> 

I did go barking up the wrong tree for a while :-) (or should that be :-( ?)

Thanks,
Nick

[1] that's a 2-bit flip:

3244298240 = 2#11000001011000000001000000000000
3243249664 = 2#11000001010100000001000000000000
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux