RE: CRC32 of messages

Dałek, Piotr <Piotr.Dalek@xxxxxxxxxxxxxx> · Mon, 29 Jun 2015 14:20:18 +0200

> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Daniel Swarbrick
> Sent: Monday, June 29, 2015 1:31 PM

> > Yes, we have our own CRC32 checksum because loooong ago (before I
> > started!) Sage saw a lot of network corruption that wasn't being
> > caught by the TCP checksums so he added some to the Ceph message
> > stream. I can't tell you with any authority whatsoever how common that
> > problem is, but I don't think we're turning them off by default in
> > upstream. :)
> 
> If the CRC32 implementation in Ceph is that dated (particularly the software
> implementations that will be used on AMD hardware), would it be worth
> checking out some of the updated implementations, such as the
> slice-by-16 or chunked methods?
> I found this link http://create.stephan-brumme.com/crc32/ and tried running
> the benchmark on an AMD Opteron 6386 SE system, with the following
> results:

First of all, this processor actually supports SSE 4.2. See here:
http://www.cpu-world.com/CPUs/Bulldozer/AMD-Opteron%206386%20SE%20-%20OS6386YETGGHK.html
In other words, it *does* support hardware CRC32 calculation.

> bitwise          : CRC=221F390F, 47.525s, 21.546 MB/s
> half-byte        : CRC=221F390F, 11.828s, 86.576 MB/s
>   1 byte  at once: CRC=221F390F, 6.347s, 161.332 MB/s
>   4 bytes at once: CRC=221F390F, 2.875s, 356.178 MB/s
>   8 bytes at once: CRC=221F390F, 2.004s, 510.932 MB/s
> 4x8 bytes at once: CRC=221F390F, 1.929s, 530.811 MB/s
>  16 bytes at once: CRC=221F390F, 1.892s, 541.179 MB/s
>  16 bytes at once: CRC=221F390F, 1.926s, 531.797 MB/s (including
> prefetching)
>     chunked      : CRC=221F390F, 1.919s, 533.656 MB/s
> 
> AFAIK, Ceph uses the slice-by-8 method if no hardware crc32 is found.

Slicing-by-16 makes use of large (16k contrary to 8k used by slicing-by-8) so switching to slicing-by-16 would cause more cache trashing than with slicing-by-8 and in turn, decrease overall Ceph performance. Not to mention that in your case, slicing-by-16 was just ~31MB/s faster, which is just 6% faster. IMHO, increased memory usage is definitely not worth it.

With best regards / Pozdrawiam
Piotr Dałek

��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f