Re: The TCP and UDP checksum algorithm may soon need updating

Warren Kumari <warren@xxxxxxxxxx> · Mon, 8 Jun 2020 19:50:09 -0400

On Sat, Jun 6, 2020 at 11:47 PM Masataka Ohta
<mohta@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Craig Partridge wrote:
>
> > OK, on to what people are seeing today.  This shows that 1 in every 121
> > file transfers FTP delivers a file that, when you do the md5 sum, turns out
> > not to match the original (note there are multiple possible reasons, but
> > TCP checksum is a strong candidate).
>
> That's unreasonable because most errors are detected by datalink
> layer checksum and almost all remaining errors are detected by
> transport layer checksum, which should have been the reason
> why transport checksum need not be so strong.

I was trying to avoid this thread, but...

I'm somewhat surprised by some of the numbers (like the 1 in every 121
file transfers).
As a quick check, I looked on a personal webserver (connections from
random people on the Internets), and have received 201.1TB, and
207,183,907,570 (207 billion) packets.
Netstat shows a total of 1029 (detected[0]) CRC errors, or around one
every ~200M packets.

I *think* (but may be completely wrong!) that the chance of a 16bit
checksum giving a false-negative is just 2^16[1], so 200M*2^16 one in
around every 13 billion packets?
My average packet size is ~970byes, so that is ~one bad packet in ~12PB.

I believe the L2 quality and checksums has increased sufficiently that
it has made up for the increase in bandwidth and data volumes -- I'm
sure we all used to watch, and expect errors on PSTN modems, 100Meg
Ethernet, 56k leased lines and the like. I still graph FCS/CRC errors
on router interfaces, but they are basically just empty graphs these
days...

Are others seeing much much worse numbers from looking at their counters?

W
[0]: Yup, it's possible that there were some number of undetected ones
(the whole debate in this thread), but assuming that there are not
systemic issues in the checksum algorithm I believe that the chance of
an error occurring and the checksum calculation happening to match is
2^(length of checksum), and so there is a ~1.6% chance of it happening
(1029/2^16), but I bumped the numbers up to 1030 anyway.

[1]: Actually, I think I overestimate the chance of this happening --
there would need to be a corruption of a packet, *and* the checksum in
the same packet would need to be corrupted, and happen to be the
correct one for that corrupted packet (a 1 in 2^16 chance) - but, that
requires (at least) 2 corruptions in the same packet, or a corruption
which happens to calculate to the same value,  and I don't know how to
easily account for that, so I'll just use the worse case estimate.

>
> > Anecdotally, folks are reporting some middlebox vendors are not updating
> > the TCP checksum but rather letting the outbound interface simply recompute
> > the entire checksum -- which means that if the TCP segment gets damaged
> > during middlebox handling, the middlebox will slap a valid checksum on bad
> > data.
>
> That should be the real problem to make transport checksum not
> to work end to end.
>
> Thus, your proposal to have stronger checksum can not prevent
> file corruptions.
>
> So, we should make middlebox vendors to update checksum incrementally
> or, check the original checksum just before sending a packet
> with the original header (not applicable if payload is also modified).
>
>                                                 Masataka Ohta
>
> PS
>
> This is a old problem documented in the original paper on
> the E2E principle.
>
> https://dl.acm.org/doi/pdf/10.1145/357401.357402
> 2.2 A Too-Real Example
> One gateway computer developed a transient
> error: while copying data from an input to
> an output buffer a byte pair was
> interchanged, with a frequency of about one
> such interchange in every million bytes passed.
>

-- 
I don't think the execution is relevant when it was obviously a bad
idea in the first place.
This is like putting rabid weasels in your pants, and later expressing
regret at having chosen those particular rabid weasels and that pair
of pants.
   ---maf