On Mon, Jun 29, 2015 at 7:55 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > On Mon, Jun 29, 2015 at 8:31 AM, Dałek, Piotr > <Piotr.Dalek@xxxxxxxxxxxxxx> wrote: >>> -----Original Message----- >>> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel- >>> owner@xxxxxxxxxxxxxxx] On Behalf Of Erik G. Burrows >>> Sent: Friday, June 26, 2015 6:49 PM >>> >>> All, >>> Can someone explain to me the rationale for performing in-software CRC32 >>> hashes of all messages through the Pipe and AsyncMessage classes? >>> >>> On my servers, operf shows that 20% of the total CPU time in my benchmark >>> tests are being spent in the librados ceph_crc32c_sctp function. I can see that >>> the library is trying to use CPU accelerations if available, but what I'd like to >>> understand is: why checksum the messages at all? >> >> As Somnath already wrote, you can disable CRC checking for messages. But they're also used for journals, among other things, so you'll always see some CPU usage spent on CRC32 calculations. >> >>> If the messages are local, there should not be any corruption at all, and if >>> they are coming in over IP, then the kernel and NIC should do Layer-2/3 CRCs >>> and reject any corrupted packets. So why re-CRC the messages at the Ceph >>> layer? >> >> I can imagine data corruption coming from Ceph itself and not caught by IP layers, for example due to bug in Ceph code or mainboard/RAM failure. And it's a nice debug feature you can use when dealing with low-level code. >> > > That's not to mention that the TCP checksum is remarkably weak. We've > just had an incident where a broken router was quite efficiently > corrupting something like 1/66 packets in a way which was invisible to > the TCP checksum. Some example corruptions are here our report -- note > that it's still a work in progress: > https://cds.cern.ch/record/2026187/files/Adler32_Data_Corruption.pdf > > Thankfully CRC32-C /probably/ prevented this broken router from > corrupting our Ceph volumes. Yes, we have our own CRC32 checksum because loooong ago (before I started!) Sage saw a lot of network corruption that wasn't being caught by the TCP checksums so he added some to the Ceph message stream. I can't tell you with any authority whatsoever how common that problem is, but I don't think we're turning them off by default in upstream. :) -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html