Valerie Aurora <vaurora@xxxxxxxxxx> wrote: > On Mon, Aug 03, 2009 at 09:54:36AM -0400, Nick Dokos wrote: > > Just a heads-up for now. I ran ll_ver_fs on a 96TB fs - the write phase > > finished without problems, but the read phase encountered a problem: > > > > ... > > read File name: /mnt/dir00373/file026 > > > > liverfs: verify /mnt/dir00373/file026 failed offset/timestamp/inode 3244298240/1248819541/1096796: found 3243249664/1248819541/1096796 instead > > > > liverfs: Data verification failed > > 770.45user 218639.65system 67:38:18elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k > > 100357573552inputs+195522668184outputs (1major+414minor)pagefaults 0swaps > > make: *** [llver] Error 2 > > > > > > The offset difference is exactly 1M, and it occurs about 3GB into the file. > > Interesting - exactly 1M off. Does this correspond to anything > interesting in extent layout or block allocation boundaries? > > Any chance you can patch ll_ver_fs to continue after the first error? > I'd be happy to write the patch for you. I did that to begin with but the problem turns out to be much more mundane: there was an IO error on one of the volumes. It wasn't quite obvious (no red lights going off) but there *was* a message in /var/log/messages - unfortunately I missed it. I eventually recreated the error by trying to read the file with ``od -c'' and then went back and found the original error. I don't know why/how ll_ver_fs managed to read the offset and come up with a 1M difference[1] -- ``od -c'' failed with a big thud. We have now replaced the disk and I'm doing the test again: it should be done (barring further problems) by sometime next week. > > > In total, there are 726 directories, each with 32 4GB files (except the last, > > which only has 12 files). So directory 373 is roughly half-way. I'll take a look > > at the block allocation of both the directory and the file and see if they are > > straddling the 16TB boundary (or other such). > > Did you have a chance to look at what falls before and after the 16TB > boundary? > I did go barking up the wrong tree for a while :-) (or should that be :-( ?) Thanks, Nick [1] that's a 2-bit flip: 3244298240 = 2#11000001011000000001000000000000 3243249664 = 2#11000001010100000001000000000000 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html