On Fri, Oct 25, 2024 at 12:18:14AM -0400, Laine Stump wrote: > Many long years ago (April 2010), soon after "vhost" in-kernel packet > processing was added to the virtio-net driver, people running RHEL5 > virtual machines with a virtio-net interface connected via a libvirt > virtual network noticed that when vhost packet processing was enabled, > their VMs could no longer get an IP address via DHCP - the guest was > ignoring the DHCP response packets sent by the host. > > (I've been informed by danpb that the same issue had been encountered, > and "fixed" even earlier than that, in 2006, with Xen as the > hypervisor.) > > The "gory details" of the 2010 discussion are chronicled here: > > https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html > > but basically it was because the checksum of packets wasn't being > fully computed on the host side (because QEMU on the host and the NIC > driver in the guest had agreed between themselves to turn off > checksums because they were unnecessary due to the "link" between the > two being entirely in local memory and not a physical cable), but > > 1) a partial checksum had been put into the packets at some point by > "someone" > > 2) the "don't use checksums" info was known by the guest kernel, which > would properly ignore the "bad" checksum), and > > 3) the packets were being read by the dhclient application on the > guest side with a "raw" socket (thus bypassing the guest kernel UDP > processing that would have known the checksum was irrelevant)), > > The "fix" for this ended up being two-tiered: > > 1) The ISC DHCP package (which contains the aforementioned dhclient > program) made a fix to their dhclient code which caused it to accept > packets anyway even if they didn't have a proper checksum (NB: that's > not a full explanation, and possibly not accurate). This fixed the > problem for guests with an updated dhclient. Here is the code with the > fix to ISC DHCP: > > https://github.com/isc-projects/dhcp/blob/master/common/packet.c#L365 > > This fixed the issue for any new/updated guests that had the fixed > dhclient, but it didn't solve the problem for existing/old guest > images that didn't/couldn't get their dhclient updated. This brings us > to: > > 2) iptables added a new "CHECKSUM" target and "--checksum-fill" > action: > > http://patchwork.ozlabs.org/patch/58525/ > > and libvirt added an iptables rule for each virtual network to match > DHCP response packets and perform --checksum-fill. This way by the > time dhclient on the guest reads the raw packet, the checksum would be > corrected, and the packet would be accepted. This was pushed upstream > in libvirt commit v0.8.2-142-gfd5b15ff1a. > > The word at the time from those more knowledgeable than me was that > the bad checksum problem was really specific to ISC's dhclient running > on Linux, and so once their fix was in use everywhere dhclient was > used, bad checksums would be a thing of the past and the > --checksum-fill iptables rules would no longer be needed (but would > otherwise be harmless if they were still there). (Plot twist: the > dhclient code in fix (1) above apparently is on a Linux-only code path > - this is very important later!) > > Based on this information (and also due to the opinion that fixing it > by having iptables modify the packet checksum was really the wrong way > to permanently fix things, i.e. an "ugly hack"), the nftables > developers made the decision to not implement an equivalent to > --checksum-fill in nftables. As a result, when I wrote the nftables > firewall backend for libvirt virtual networks earlier this year, it > didn't add in any rule to "fix" broken UDP checksums (since there was > apparently no equivalent in nftables and, after all, that was fixed > somewhere else 14 years ago, right???) > > But last week, when Rich Jones was doing routine testing using a Fedora > 40 host (the first Fedora release to use the nftables backend of libvirt's > network driver by default) and a FreeBSD guest, for "some strange > reason", the FreeBSD guest was unable to get an IP address from DHCP!! > > https://www.spinics.net/linux/fedora/libvirt-users/msg14356.html > > A few quick tests proved that it was the same old "bad checksum" > problem from 2010 come back to haunt us - it wasn't a Linux-only issue > after all. > > Phil Sutter and Eric Garver (nftables people) pointed out that, while > nftables doesn't have an action that will *compute* the checksum of a > packet, it *does* have an action that will set the checksum to 0, and > suggested we try adding a "zero the checksum" rule for dhcp response > packets to our nftables ruleset. (Why? Because a checksum value of 0 > in a IPv4 UDP packet is defined by RFC768 to mean "no checksum > generated", implying "checksum not needed"). It turns out that this > works - dhclient properly recognizes that a 0 checksum means "don't > bother with the checksum", and accepts the packet as valid. > > So to once again fix this timeless bug, this patch adds such a > checksum zeroing rule to the nftables rules setup for each virtual > network. > > This has been verified (on a Fedora 40 host) to fix DHCP with FreeBSD > guests, while not breaking it for Fedora or Windows (10) guests. You can add OpenBSD to that list, as I tested that too. > > Fixes: b89c4991daa0ee9371f10937fab3b03c5ffdabc6 > Reported-by: Rich Jones <rjones@xxxxxxxxxx> > Fix-Suggested-by: Eric Garver <egarver@xxxxxxxxxx> > Fix-Suggested-by: Phil Sutter <psutter@xxxxxxxxxx> > Signed-off-by: Laine Stump <laine@xxxxxxxxxx> > Reviewed-by: Daniel P. Berrangé <berrange@xxxxxxxxxx> > --- > > (NB: jdenemar reminded me this is a bugfix, so we don't need to rush > to get it pushed before he freezes for the RC1 snapshot. I'm happy > with it as is though, if anyone is awake early enough and wants to > push it before he snapshots RC1. Otherwise I'll just push it later and > it will be in RC2) > > Changes from V1: > > * informational errors/omissions in commit log message fixed > > src/network/network_nftables.c | 69 +++++++++++++++++++ > tests/networkxml2firewalldata/base.nftables | 14 ++++ > .../forward-dev-linux.nftables | 16 +++++ > .../isolated-linux.nftables | 16 +++++ > .../nat-default-linux.nftables | 16 +++++ > .../nat-ipv6-linux.nftables | 16 +++++ > .../nat-ipv6-masquerade-linux.nftables | 16 +++++ > .../nat-many-ips-linux.nftables | 16 +++++ > .../nat-port-range-ipv6-linux.nftables | 16 +++++ > .../nat-port-range-linux.nftables | 16 +++++ > .../nat-tftp-linux.nftables | 16 +++++ > .../route-default-linux.nftables | 16 +++++ > 12 files changed, 243 insertions(+) Reviewed-by: Daniel P. Berrangé <berrange@xxxxxxxxxx> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|