Re: ll_ver_fs data verification failure - 96TB fs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nick Dokos <nicholas.dokos@xxxxxx> wrote:

> > On Aug 06, 2009  16:37 -0400, Nick Dokos wrote:
> > > I did that to begin with but the problem turns out to be much more
> > > mundane: there was an IO error on one of the volumes. It wasn't quite
> > > obvious (no red lights going off) but there *was* a message in
> > > /var/log/messages - unfortunately I missed it. I eventually recreated
> > > the error by trying to read the file with ``od -c'' and then went back
> > > and found the original error. I don't know why/how ll_ver_fs managed to
> > > read the offset and come up with a 1M difference[1] -- ``od -c'' failed with
> > > a big thud.
> > 
> > Can you have a look at the error handling in ll_ver_fs at that point?
> > It seems that it might just have re-used the previous 1MB buffer, but
> > didn't detect/report the error from the read, which would itself be bad.
> > 
> 
> It looks right to me:
> 
> ,----
> |               ...
> | 		if (read(fd, chunk_buf, chunksize) < 0) {
> | 			fprintf(stderr, "\n%s: read %s+%llu failed: %s\n",
> | 				progname, file, offset, strerror(errno));
> | 			return 1;
> | 		}
> | 		if (verify_chunk(chunk_buf, chunksize, offset, time_st,
> | 				 inode_st, file) != 0)
> | 			return 1;
> |               ...
> `----
> 
> The read() should have failed (and I should have gotten a different error
> message) but somehow it didn't - instead, verify_chunk() was called and
> *that* detected the mismatch.
> 

The offset starts at 0 and is marched along at a 1MB stride, so it's
conceivable that the read() returned 0 (avoiding the error return) and
did not touch chunk_buf (thereby leading to the verify_chunk() failure).

That would be consistent with what I got, but if so, it would be a kernel
bug somewhere in the read() calling sequence, right?

Thanks,
Nick

PS. For the record, this is the error from /var/log/messages:

,----
| ...
| Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Unhandled sense code
| Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
| Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Sense Key : Medium Error [current] 
| Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Add. Sense: No additional sense information
| Aug  5 18:14:11 shifter kernel: end_request: I/O error, dev sdn, sector 2238819328
| Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Unhandled sense code
| Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
| Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Sense Key : Medium Error [current] 
| Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Add. Sense: No additional sense information
| Aug  5 18:16:03 shifter kernel: end_request: I/O error, dev sdn, sector 2238819592
`----
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux