Re: [PATCH (RFC and a half?)] network: add rule to nftables backend that zeroes checksum of DHCP responses

Laine Stump <laine@xxxxxxxxxx> · Thu, 24 Oct 2024 13:52:19 -0400

On 10/24/24 12:36 PM, Daniel P. Berrangé wrote:
On Mon, Oct 21, 2024 at 12:14:38AM -0400, Laine Stump wrote:
Many long years ago (April 2010), soon after "vhost" in-kernel packet
processing was added to the virtio-net driver, people running RHEL5
virtual machines with a virtio-net interface connected via a libvirt
virtual network noticed that when vhost packet processing was enabled,
their VMs could no longer get an IP address via DHCP - the guest was
ignoring the DHCP response packets sent by the host.

The (as danpb calls them) "gory details" of this are chronicled here:

   https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html

but basically it was because the checksum of packets wasn't being
fully computed on the host side (because the host had checksum
offloading enabled and thought that it would be taken care of later,
e.g. with NIC hardware), while these packets going from a tap device
to a virtio-net NIC in a guest wouldn't get that service, and the
packets would arrive with a "bad checksum".

AFAIR, it isn't actually a bug with virtio-net usage as this last
bit suggests. Rather it is a result of feature negotiation with QEMU
on the host, whereby the guest & QEMU mutually agree to turn off
checksums because they are redundant when the "link" is just local
memory not a physical cable.

IOW, packets don't arrive in the guest with a bad checksum. They
arrive in the guest with no checksum *as requested* by the guest.

The DHCP client decides this is a bad checksum, as it wasn't
aware of the checksum offload usage.

The "fix" for this ended up being that iptables added a new
"--checksum-fill" action, and libvirt added an iptables rule for each
virtual network to match DHCP response packets and perform
--checksum-fill.

In the meantime, the ISC DHCP package (which contains the dhclient
program that had been rejecting the bad checksum packets) made a
separate fix to their dhclient which caused it to accept packets
anyway even if they didn't have a proper checksum (NB: that's not a
full explanation, and possibly not accurate). The word at the time>
q from those "in the know" was that the bad checksum problem was really
specific to ISC's dhclient, and so once their fix was in use
everywhere dhclient was used, the problem would be a thing of the past
and the checksum fixup iptables rules would no longer be needed (but
would otherwise be harmless if it was still there).

The fix did indeed work correctly for dhclient.... on linux !

The fix relied on a Linux specific sockets API extension, and
thus wasn't applicable to non-Linux codepaths in dhclient AFAICT.

Based on this information (and also due to the opinion that fixing the
problem by having iptables modify the packet checksum was the wrong
way to fix things), the nftables developers made the decision to not
implement an equivalent to --checksum-fill in nftables. As a result,
when I wrote the nftables firewall backend for libvirt virtual
networks, it didn't add in any rule to "fix" broken UDP checksums
(after all, that was fixed somewhere else 14 years ago, right???)

....and in Fedora/RHEL context it was fixed 18 years ago, as we
first hit this when working on Xen integration in 2006 :-)

A few quick tests proved that it was the same old "bad checksum"
problem from 2010 come back to haunt us.

2006 :-)

After some discussion with Phil Sutter and Eric Garver (nftables
people), they suggested that, while nftables doesn't have an action
that will *compute* the checksum of a packet, it *does* have an action
that will set the checksum to 0, and that maybe we should try
that. Then Phil tried it himself by manually adding such a rule to a
running system, and verified that it did fix the issue at least for
FreeBSD guests.

So over the weekend I came up with a patch to add a checksum 0 rule to
the rules setup for each virtual network. This is that patch.

I have so far verified that this patch enables FreeBSD to receive the
DHCP response and get an IP address, and that it hasn't *broken* this
functionality for a random old Fedora image I had (Fedora 27!?!?! I
really need to update my test images!!). Before pushing it I would
like to verify that zeroing the checksum of DHCP response packets
doesn't break any other guest, so I would appreciate the help of
anyone who could build and install libvirt with this patch and let me
know of both successes and failures of any guest to acquire an IP
address with DHCP. Once I've received enough positive reports (and 0
negative reports!) then we can think about pushing this patch (and
also backporting it downstream to Fedora 40)

On the one hand it is good that you test this and found it to
to work.

What concerns me is a lack of understanding of /why/ it works.

AFAICT there is nothing in the TCP RFC documenting all-zeros
as a special case for indicating absent checksums.

Well, RFC768 says this:

> If the computed  checksum  is zero,  it is transmitted  as all ones
> (the equivalent  in one's complement  arithmetic).   An all zero
> transmitted checksum  value means that the transmitter  generated
> no checksum  (for debugging or for higher level protocols that
> don't care).

(so a checksum of 0 as an actual computed checksum is never possible, 
and there are cases where a sender might no compute a checksum and the 
receiver *could* still accept the packet as valid).

The Wikipedia entry for UDP says this:

> UDP checksum computation is optional for IPv4. If a checksum is
> not used it should be set to the value zero.

Again implying that 0 can be used to indicate that a checksum wasn't 
computed but the packet can still be accepted.

Another indicator is that tcpdump will display "[udp sum ok]" if the 
checksum in the packet matches what is computed, and "[no cksum]" if the 
checksum is 0, but "[bad udp cksum 0xXXXX -> 0xYYYY!]" if the checksum 
is any other value and not correct. So this *implies* special treatment 
of a checksum of 0.

Also there is an entire RFC (6936) dedicated to the topic

"Applicability Statement for the Use of IPv6 UDP Datagrams with Zero 
Checksums"

but of course that's not applicable here. I didn't find a similar RFC 
for IPv4.

I'd really like to know /why/ it works, so we can be confident
we're relying on intentional behaviour, as opposed to a happy
accident.

Many searches have led me to statements that in IPv4 UDP packets, a 
checksum of 0 means "checksum not done/needed", but no official document 
that says "if a UDP packet has a checksum of 0 it MUST be accepted as 
valid". So .... *shrug*.

Functionally your patch does what it claims to do, so codewise
I'm happy to say Reviewed-by: Daniel P. Berrangé <berrange@xxxxxxxxxx>,
but I'd rather not merge it without a deeper understandnig.

Just as I was hitting send jdenemar notified me you'd responded to 
yourself with one of the same quotes :-).

I want to fix the commit log message based on your more-correct info, so 
I'll try to do that and re-send so you can verify and we can get history 
correct, but I might not get it done until later tonight. If that 
happens, I'll leave a message requesting that you push it if the log 
message is okay (or just correct it and push that)

Thanks for investigating !! :-)