>>>>> "hpa" == H Peter Anvin <hpa@xxxxxxxxx> writes: >> What we really want in drives that store 520 byte sectors so that a >> checksum can be passed all the way up and down through the stack >> .... or something like that. >> hpa> A lot of SCSI disks have that option, but I believe it's not hpa> arbitrary bytes. In particular, the integrity check portion is hpa> only 2 bytes, 16 bits. It's important to distinguish between drives that support 520 byte sectors and drives that include the Data Integrity Feature which also uses 520 byte sectors. Most regular SCSI drives can be formatted with 520 byte sectors and a lot of disk arrays use the extra space to store an internal checksum. The downside to 520 byte sectors is that it makes buffer management a pain as 512 bytes of data is followed by 8 bytes of protection data. That sucks when writing - say - a 4KB block because your scatterlist becomes long and twisted having to interleave data and protection data every sector. The data integrity feature also uses 520 byte byte sectors. The difference is that the format of the 8 bytes is well defined. And that both initiator and target are capable of verifying the integrity of an I/O. It is correct that the CRC is only 16 bits. DIF is strictly between HBA and disk. I'm lobbying HBA vendors to expose it to the OS so we can use it. I'm also lobbying to get them to allow us to submit the data and the protection data in separate scatterlists so we don't have to do the interleaving at the OS level. hpa> One option, of course, would be to store, say, 16 hpa> sectors/pages/blocks in 17 physical sectors/pages/blocks, where hpa> the last one is a packing of some sort of high-powered integrity hpa> checks, e.g. SHA-256, or even an ECC block. This would hurt hpa> performance substantially, but it would be highly useful for very hpa> high data integrity applications. A while ago I tinkered with something like that. I actually cheated and stored the checking data in a different partition on the same drive. It was a pretty simple test using my DIF code (i.e. 8 bytes per sector). I wanted to see how badly the extra seeks would affect us. The results weren't too discouraging but I decided I liked the ZFS approach better (having the checksum in the fs parent block which you'll be reading anyway). -- Martin K. Petersen Oracle Linux Engineering - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html