On Wed, Feb 01, 2012 at 10:52:55AM -0600, James Bottomley wrote: > On Wed, 2012-02-01 at 11:45 -0500, Chris Mason wrote: > > On Tue, Jan 31, 2012 at 11:28:26AM -0800, Gregory Farnum wrote: > > > On Tue, Jan 31, 2012 at 11:22 AM, Bernd Schubert > > > <bernd.schubert@xxxxxxxxxxxxxxxxxx> wrote: > > > > I guess we should talk to developers of other parallel file systems and see > > > > what they think about it. I think cephfs already uses data integrity > > > > provided by btrfs, although I'm not entirely sure and need to check the > > > > code. As I said before, Lustre does network checksums already and *might* be > > > > interested. > > > > > > Actually, right now Ceph doesn't check btrfs' data integrity > > > information, but since Ceph doesn't have any data-at-rest integrity > > > verification it relies on btrfs if you want that. Integrating > > > integrity verification throughout the system is on our long-term to-do > > > list. > > > We too will be said if using a kernel-level integrity system requires > > > using DIO, although we could probably work out a way to do > > > "translation" between our own integrity checksums and the > > > btrfs-generated ones if we have to (thanks to replication). > > > > DIO isn't really required, but doing this without synchronous writes > > will get painful in a hurry. There's nothing wrong with letting the > > data sit in the page cache after the IO is done though. > > I broadly agree with this, but even if you do sync writes and cache read > only copies, we still have the problem of how we do the read side > verification of DIX. In theory, when you read, you could either get the > cached copy or an actual read (which will supply protection > information), so for the cached copy we need to return cached protection > information implying that we need some way of actually caching it. Good point, reading from the cached copy is a lower level of protection because in theory bugs in your scsi drivers could corrupt the pages later on. But I think even without keeping the crcs attached to the page, there is value in keeping the cached copy in lots of workloads. The database is going to O_DIRECT read (with crcs checked) and then stuff it into a database buffer cache for long term use. Stuffing it into a page cache on the kernel side is about the same. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html