>>>>> "Greg" == Greg Freemyer <greg.freemyer@xxxxxxxxx> writes: Greg> I also see Device Mapper support was discussed in Oct. (My 2.6.27 Greg> kernel does not have those patches). See below. Greg> Is there a more comprehensive write-up / resource that describes Greg> the current status of the overall INTEGRITY support is, http://oss.oracle.com/projects/data-integrity/documentation/ The status is: - The infrastructure in the kernel is in place as of .27. Hoping to get MD/DM support in .29 but I'm running late wrt. the merge window. - We recently announced an early adopter program for Oracle DB customers. The ASM component of the database now supports the integrity hooks so we can true end-to-end integrity protection of DB I/O. - btrfs support is work in progress. - Other people have expressed interest in adding support to ext4 and XFS. Greg> especially as it relates to ATA devices? ATA support was put on hold in the T13 committee because the drive vendors don't feel like adding a big, intrusive feature to their firmware. I'm still hoping we can eventually get support added to nearline class drives but it'll be a while. Market demand needs to be there first. I.e. the array vendors that use SATA drives will need to start asking for it. We're just, just, just starting to push out FC support. Then comes SAS. And then hopefully ATA. Greg> ie. Do actual ATA hardware devices that support "T13/ATA External Greg> Path Protection" exist yet? Does it require HDD and controller Greg> support? Or just HDD? Both. You could emulate some of the DIX features in software (like scatterlist interleaving) and then plug in the long commands on the back end. But as Mark said the checksum formats differ between drive vendors/models. On SCSI you could conceivably use the block integrity stuff to store an LVM/MD checksum when used with devices that expose the application tag. However, it's only a 16-bit field (16 bits - 1 to be exact) so it's not exactly a lot of space. And only dumb drives are going to make it available. Some RAID controllers are going to keep those 16-bits for their own internal use. The main purpose of the block integrity stuff is to protect in-flight I/O. Persistence is an optional feature and a side-effect. So I think it would be much more worthwhile to implement checksumming in MD/DM without relying on special hardware. I did some experiments in that department a few years ago when we were investigating how to go about fixing some of the data integrity problems in Linux. I wrote something akin to DIF in software by doing 64 512-byte blocks + 512 bytes of checksums. The disadvantage there is having to do read-modify-write for small writes. I tried several other approaches sacrificing both space and locality but performance was still anemic. The reason DIF is implemented the way it is (with 520 byte sectors: 512 bytes followed by 8 bytes of checksum) is to prevent the cost of seeking to write the protection information elsewhere. With solid state devices that seek penalty doesn't exist so this may become less of an issue going forward. The beauty of checksumming in btrfs is that the checksum is stored in the filesystem metadata which is read/written anyway. So the only overhead is in calculating the actual checksum. That's something virtual block devices have a much harder time providing because they don't have metadata describing individual blocks. That doesn't mean it can't be done but it's a lot more work. I'm personally much more interested in adding support for adding a retry-other-mirror interface to MD/DM and leave the checksumming to the filesystems. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html