Le 21/10/2014 09:31, Nicheal a écrit : > 2014-10-21 7:40 GMT+08:00 Lionel Bouton <lionel+ceph@xxxxxxxxxxx>: >> Hi, >> >> Le 21/10/2014 01:10, 池信泽 a écrit : >> >> Thanks. >> >> Another reason is the checksum in the attr of object used for deep scrub >> in EC pools should be computed when modify the object. When supporting the >> random write, We should caculate the whole object for checksum, even if >> there is a bit modified. If only supporting append write, We can get the >> checksum based on the previously checksum and the append date which is more >> quickly. >> >> Am I right? >> >> >> From what I understand, the deep scrub doesn't use a Ceph checksum but >> compares data between OSDs (and probably use a "majority wins" rule for >> repair). If you are using Btrfs it will report an I/O error because it uses >> an internal checksum by default which will force Ceph to use other OSDs for >> repair. >> I'd be glad to be proven wrong on this subject though. > No, when deep scrubbing, not whole 4M objects(I mean if we set object > size: 4M) content compare with each other byte by byte. I will > introduce high overload on network, If you transmit whole 4M objects, > even if we compress the object content. Instead, whole 4M object > content will generate a 64bit hash-digest. With comparing the hash > digest, it confirms whether the content is consistent. But it still > need to read out whole 4M object content, so scrub without deep just > compare the meta info of each object. What I meant is that I believe the source of data being compared is not a checksum already stored on disk at write time (which is what I understood by "checksum in the attr of object" in the original post) that can detect bit rot by itself. The fact that there is a network usage optimization using dynamically computed hashes doesn't change that corruption detection is done by comparing the data between peers and not using a checksum stored locally at write time which would bring additional integrity guarantees (for example it would allow repair to choose the correct replica out of 2 in pools configured with max-size 2). Best regards, Lionel Bouton _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com