Hello, On Wed, 15 Jun 2016 08:48:57 +0200 Gandalf Corvotempesta wrote: > Il 15 giu 2016 03:27, "Christian Balzer" <chibi@xxxxxxx> ha scritto: > > And that makes deep-scrubbing something of quite limited value. > > This is not true. Did you read what I and Jan wrote? > If you checksum *before* writing to disk (so when data is still in ram) > then when reading back from disk you could do the checksum verification > and if doesn't match you can heal from the other nodes > Very true and Ceph does all its writes from memory with regards to client writes. However Ceph doesn't do any checksum verifications on reads, so potentially corrupted data can and will be served to the clients. The only time the "healing" can happen is during deep-scrubs (if the data corruption is persistent and not random) and that is of course possibly long (up to week with default values) after that corrupt data has been served to a client. This is why people are using BTRFS and ZFS for filestore (despite the problems they in turn create) and why the roadmap for bluestore has checksums for reads on it as well (or so we've been told). > Obviously you have to replicate directly from ram when bitrot couldn't > happen. > if you write to disk and then replicate the wrote data you could > replicate a rotted value. Which is exactly what could happen if you have any kind of data movement, be it re-weighing OSDs, adding news ones, even the snapshot scenario Jan mentioned. Because in these cases the data is read from the primary PG, the disk. Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com