Martin K. Petersen wrote:
"hpa" == H Peter Anvin <hpa@xxxxxxxxx> writes:
What we really want in drives that store 520 byte sectors so that a
checksum can be passed all the way up and down through the stack
.... or something like that.
hpa> A lot of SCSI disks have that option, but I believe it's not
hpa> arbitrary bytes. In particular, the integrity check portion is
hpa> only 2 bytes, 16 bits.
It's important to distinguish between drives that support 520 byte
sectors and drives that include the Data Integrity Feature which also
uses 520 byte sectors.
Most regular SCSI drives can be formatted with 520 byte sectors and a
lot of disk arrays use the extra space to store an internal checksum.
The downside to 520 byte sectors is that it makes buffer management a
pain as 512 bytes of data is followed by 8 bytes of protection data.
That sucks when writing - say - a 4KB block because your scatterlist
becomes long and twisted having to interleave data and protection
data every sector.
The data integrity feature also uses 520 byte byte sectors. The
difference is that the format of the 8 bytes is well defined. And
that both initiator and target are capable of verifying the integrity
of an I/O. It is correct that the CRC is only 16 bits.
When last I looked at Hamming code, and that would be 1989 or 1990, I
believe that I learned that the number of Hamming bits needed to cover N
data bits was 1+log2(N), which for 512 bytes would be 1+12, and fit into
a 16 bit field nicely. I don't know that I would go that way, fix any
one bit error, detect any two bit error, rather than a CRC which gives
me only one chance in 64k of an undetected data error, but I find it
interesting.
I also looked at fire codes, which at the time would still be a viable
topic for a thesis. I remember nothing about how they worked whatsoever.
DIF is strictly between HBA and disk. I'm lobbying HBA vendors to
expose it to the OS so we can use it. I'm also lobbying to get them
to allow us to submit the data and the protection data in separate
scatterlists so we don't have to do the interleaving at the OS level.
hpa> One option, of course, would be to store, say, 16
hpa> sectors/pages/blocks in 17 physical sectors/pages/blocks, where
hpa> the last one is a packing of some sort of high-powered integrity
hpa> checks, e.g. SHA-256, or even an ECC block. This would hurt
hpa> performance substantially, but it would be highly useful for very
hpa> high data integrity applications.
A while ago I tinkered with something like that. I actually cheated
and stored the checking data in a different partition on the same
drive. It was a pretty simple test using my DIF code (i.e. 8 bytes
per sector).
I wanted to see how badly the extra seeks would affect us. The
results weren't too discouraging but I decided I liked the ZFS
approach better (having the checksum in the fs parent block which
you'll be reading anyway).
--
bill davidsen <davidsen@xxxxxxx>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html