Re: why the erasure code pool not support random write?

Nicheal <zay11022@xxxxxxxxx> · Tue, 21 Oct 2014 15:31:22 +0800

2014-10-21 7:40 GMT+08:00 Lionel Bouton <lionel+ceph@xxxxxxxxxxx>:
> Hi,
>
> Le 21/10/2014 01:10, 池信泽 a écrit :
>
> Thanks.
>
>    Another reason is the checksum in the attr of object used for deep scrub
> in EC pools should be computed when modify the object. When supporting the
> random write, We should caculate the whole object for checksum, even if
> there is a bit modified. If only supporting append write, We can get the
> checksum based on the previously checksum and the append date which is more
> quickly.
>
>    Am I right?
>
>
> From what I understand, the deep scrub doesn't use a Ceph checksum but
> compares data between OSDs (and probably use a "majority wins" rule for
> repair). If you are using Btrfs it will report an I/O error because it uses
> an internal checksum by default which will force Ceph to use other OSDs for
> repair.
> I'd be glad to be proven wrong on this subject though.
No, when deep scrubbing, not whole 4M objects(I mean if we set object
size: 4M) content compare with each other byte by byte. I will
introduce high overload on network, If you transmit whole 4M objects,
even if we compress the object content. Instead, whole 4M object
content will generate a 64bit hash-digest. With comparing the hash
digest,  it confirms whether the content is consistent. But it still
need to read out whole 4M object content, so scrub without deep just
compare the meta info of each object.

But for the erasure pool, the situation change. Since in replicated
pool, the content in each replica is actually same as the primary.
However, for 4+2 erasure, for example, the content in this six chunks
are totally different. so if deep scrub, all content need to read out
and transmitted to the  scrub sponsor osd. Calculate the EC parity and
compare them. This will be expensive. So I am not quite clear whether
it is in the erasure pool case. Wait others input

>
> Best regards,
>
> Lionel Bouton
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com