The TCP and UDP checksum algorithm may soon need updating

Craig Partridge <craig@xxxxxxxxxxxxx> · Thu, 4 Jun 2020 13:12:14 -0600

Hi folks:
This note is intended as an invitation to think a bit about a potential hard problem.

There's a small body of literature suggesting that the TCP checksum is regularly failing to detect errors and that we're getting close the point where using an MD5 authentication check will be insufficient (e.g.. the number of times the TCP checksum fails to detect errors is so large that TCP passes through enough errors that the md5 check won't catch all of them).  This situation is due to the growth in both total traffic and the size of individual data transfers.  This is not a surprise -- it was anticipated 20 years ago, when studies showed the TCP checksum was quite weak.

I'm part of a team that is setting out to do a big set of measurements to figure out if the other reports that we're close to a tipping point are (a) correct; and (b) what kinds of errors are getting through.  That data will tell us if a new checksum is warranted.  We hope to know in about a year. We have time to think.

If we need a new checksum, then we are in an interesting space.  There is a defined way to negotiate a new checksum in TCP (RFC 1146, now obsolete, but we can presumably unobsolete it).  But all the middleboxes that like to muck with the TCP header and data would have to both (a) learn about the option and (b) implement the new checksum algorithm(s).  Middleboxes are the problem because if an end system doesn't update the checksum, that's on the end system owner and their willingness to risk bad data.  But if an end system updates and can't transfer data due to the middlebox's ignorance, that's a larger system problem.

Then there's UDP.  UDP has no options.  We could retrofit options by leveraging the reserved zero checksum value and some magic codes at the start or end of the UDP data, but that's ugly.  Or we could define a UDPv2 (UDP has no version numbers either!) and give it another IP protocol number.  But if we don't fix UDP, protocols above UDP, like QIUC, need fixing...

I don't think we'll need to fix IP and ICMP, as the consequences of occasional error aren't a big deal for them.  A misrouted packet or unreadable ICMP in every million packets or so is probably OK.

At any rate, in a spare moment, worth pondering how one might proceed.

Thanks!

Craig

PS: Some folks may wonder if we couldn't protect ourselves by using a bigger MAC than md5.  Yes but (a) that doesn't solve the problem for protocols that don't do message authentication; and (b) MACs are lousy checksums.

That second statement may surprise folks, so here's the one paragraph point.  A checksum for error checking (i.e. not proof against adversaries) should be designed to detect all instances of common types of errors and, for errors other than those common types, detect errors proportionate to the width (in bits) of the checksum. Thus, for a checksum of width W bits, we'd expect it to fail to detect an error with a probability of 1 in 2^(W+4) or better.  Some newer checksums may be able to do even better, like 1 in 2^(2*W). Whereas a MAC of width W bits can only fail to catch errors with a probability of 1 in 2^W due to the additional requirement to thwart an adversary (not sure this is a proven property, but it has consistently been true).  So, for the same width in bits, a checksum catches many more errors -- and checksums are computationally much cheaper to compute than MACs.

-- 
*****
Craig Partridge's email account for professional society activities and mailing lists.