Re: [PATCH (RFC and a half?)] network: add rule to nftables backend that zeroes checksum of DHCP responses

Daniel P. Berrangé <berrange@xxxxxxxxxx> · Thu, 24 Oct 2024 17:36:10 +0100

On Mon, Oct 21, 2024 at 12:14:38AM -0400, Laine Stump wrote:
> Many long years ago (April 2010), soon after "vhost" in-kernel packet
> processing was added to the virtio-net driver, people running RHEL5
> virtual machines with a virtio-net interface connected via a libvirt
> virtual network noticed that when vhost packet processing was enabled,
> their VMs could no longer get an IP address via DHCP - the guest was
> ignoring the DHCP response packets sent by the host.
> 
> The (as danpb calls them) "gory details" of this are chronicled here:
> 
>   https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html
> 
> but basically it was because the checksum of packets wasn't being
> fully computed on the host side (because the host had checksum
> offloading enabled and thought that it would be taken care of later,
> e.g. with NIC hardware), while these packets going from a tap device
> to a virtio-net NIC in a guest wouldn't get that service, and the
> packets would arrive with a "bad checksum".

AFAIR, it isn't actually a bug with virtio-net usage as this last
bit suggests. Rather it is a result of feature negotiation with QEMU
on the host, whereby the guest & QEMU mutually agree to turn off
checksums because they are redundant when the "link" is just local
memory not a physical cable.

IOW, packets don't arrive in the guest with a bad checksum. They
arrive in the guest with no checksum *as requested* by the guest.

The DHCP client decides this is a bad checksum, as it wasn't
aware of the checksum offload usage.

> The "fix" for this ended up being that iptables added a new
> "--checksum-fill" action, and libvirt added an iptables rule for each
> virtual network to match DHCP response packets and perform
> --checksum-fill.
> 
> In the meantime, the ISC DHCP package (which contains the dhclient
> program that had been rejecting the bad checksum packets) made a
> separate fix to their dhclient which caused it to accept packets
> anyway even if they didn't have a proper checksum (NB: that's not a
> full explanation, and possibly not accurate). The word at the time>
q from those "in the know" was that the bad checksum problem was really
> specific to ISC's dhclient, and so once their fix was in use
> everywhere dhclient was used, the problem would be a thing of the past
> and the checksum fixup iptables rules would no longer be needed (but
> would otherwise be harmless if it was still there).

The fix did indeed work correctly for dhclient.... on linux !

The fix relied on a Linux specific sockets API extension, and
thus wasn't applicable to non-Linux codepaths in dhclient AFAICT.

> Based on this information (and also due to the opinion that fixing the
> problem by having iptables modify the packet checksum was the wrong
> way to fix things), the nftables developers made the decision to not
> implement an equivalent to --checksum-fill in nftables. As a result,
> when I wrote the nftables firewall backend for libvirt virtual
> networks, it didn't add in any rule to "fix" broken UDP checksums
> (after all, that was fixed somewhere else 14 years ago, right???)

....and in Fedora/RHEL context it was fixed 18 years ago, as we
first hit this when working on Xen integration in 2006 :-)

> A few quick tests proved that it was the same old "bad checksum"
> problem from 2010 come back to haunt us.

2006 :-)

> After some discussion with Phil Sutter and Eric Garver (nftables
> people), they suggested that, while nftables doesn't have an action
> that will *compute* the checksum of a packet, it *does* have an action
> that will set the checksum to 0, and that maybe we should try
> that. Then Phil tried it himself by manually adding such a rule to a
> running system, and verified that it did fix the issue at least for
> FreeBSD guests.
> 
> So over the weekend I came up with a patch to add a checksum 0 rule to
> the rules setup for each virtual network. This is that patch.
> 
> I have so far verified that this patch enables FreeBSD to receive the
> DHCP response and get an IP address, and that it hasn't *broken* this
> functionality for a random old Fedora image I had (Fedora 27!?!?! I
> really need to update my test images!!). Before pushing it I would
> like to verify that zeroing the checksum of DHCP response packets
> doesn't break any other guest, so I would appreciate the help of
> anyone who could build and install libvirt with this patch and let me
> know of both successes and failures of any guest to acquire an IP
> address with DHCP. Once I've received enough positive reports (and 0
> negative reports!) then we can think about pushing this patch (and
> also backporting it downstream to Fedora 40)

On the one hand it is good that you test this and found it to
to work.

What concerns me is a lack of understanding of /why/ it works.

AFAICT there is nothing in the TCP RFC documenting all-zeros
as a special case for indicating absent checksums.

I'd really like to know /why/ it works, so we can be confident
we're relying on intentional behaviour, as opposed to a happy
accident.

Functionally your patch does what it claims to do, so codewise
I'm happy to say Reviewed-by: Daniel P. Berrangé <berrange@xxxxxxxxxx>,
but I'd rather not merge it without a deeper understandnig.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|