On 10/14/24 12:06 PM, Daniel P. Berrangé wrote:
On Mon, Oct 14, 2024 at 04:55:37PM +0100, Daniel P. Berrangé wrote:
On Mon, Oct 14, 2024 at 04:37:42PM +0100, Richard W.M. Jones wrote:
On Mon, Oct 14, 2024 at 10:46:22AM -0400, Laine Stump wrote:
On 10/14/24 5:35 AM, Richard W.M. Jones wrote:
On Mon, Oct 14, 2024 at 09:52:13AM +0100, Daniel P. Berrangé wrote:
Urgh, I wonder if this is fallout from switching to NFT instead of iptables.
I can list the firewall rules if you tell me what I'm looking for ...
IIUC, the NFT kernel maintainers didn't implement for checksum fixup rules,
since they believe that all modern distros would have long ago fixed their
bugs wrt mangled checksums.
That's the first thing that came to my mind too - maybe RHEL5
*isn't* the only guest OS that has this problem. (I certainly hope
that isn't the case :-/)
There are two ways to test out this theory:
1) change the setting of "firewall_backend" in
/etc/libvirt/network.conf to "iptables" and restart virtnetworkd
(if that does work, then switch back to nftables, restart
virtnetworkd, and test again just to make sure the issue wasn't
caused by some out-of-place rule)
I changed the setting between nftables and iptables a few times and I
can confirm that your theory seems to be correct.
iptables =>
"5 bad udp checksums in 5 packets" message is NOT seen
FreeBSD gets an immediate DHCPOFFER and boots quickly with network
nftables =>
FreeBSD sends 5 DHCPDISCOVER messages
"5 bad udp checksums in 5 packets" reappears
FreeBSD does NOT see DHCPOFFER, although it does seem to remember
the offer from the previous boot, so it does get a network
connection in the end.
or
2) tell qemu to setup the virtio-net device to do its packet
processing in userspace rather than the kernel. You do this by
adding
<driver name='qemu'/>
to the <interface> section.
This also works (with nftables).
If I understand the trace correctly, the bad checksum originates on
the Linux host (the reply sent by dnsmasq).
I need to try it again to verify, but my recollection is that (when
you're using virtio-net with default settings) the checksums of DHCP
packets in one direction or the other *always* show up in tcpdump as
having bad checksums, but they still end up getting to the other end
with a proper checksum. Sometime in the distant past I *may have*
had it explained to me why this happens, but I don't recall now.
Anyway, I'm just saying this so that you know the validity of the
UDP checksum shouldn't be used as an indicator of whether or not
things are "working".
I have to say I also don't really understand what's happening here.
Isn't the Linux host sending DHCPOFFER? Why doesn't it set the UDP
checksum correctly and/or why would tcpdump report it wrongly if it is
setting it?
Here are the original gory details
https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html
TL;DR: we have checksum offload running so the host doesn't fill
in any checksums, but DHCP client then tries to validate the
non-existant checksum. Boom.
ISC DHCP fixed this in
https://github.com/isc-projects/dhcp/commit/7ff6ae5aa85754119319def3c7f225a40da299c4
and if i'm interpreting this patch correctly, it is only fixed on
Linux - most changes are in lpf.c, which is "Linux Packet Filter",
and I'm assumnig that codepath won't be used on *BSD.
If correct, then the idea that checksum fixup from iptables is
obsolete is incorrect, and we need it added to nftables for parity.
Requiring users to turn off vhost-net feature is horrible, not
just for the user experiance of not having a broken VM out of
the box, but also for performance, as checksum offloading is a
good thing if you want fast networking.
Phil Sutter and Eric Garver suggested that we try 0'ing out the checksum
of these packets, which is something that nftables *can* do. Phil tried
it and it worked for him, so I tried it and it worked for me too. So
this weekend I made a patch that will add a rule like this:
nft -ae insert rule ip libvirt_network postroute_mangle \
oif virbr0 udp dport 68 counter udp checksum set 0
along with adding a single chain like this to contain all those rules:
nft add chain ip libvirt_network guest_mangle \
'{ type filter hook postrouting priority 0; policy accept; }'
I've tested it with FreeBSD and Fedora guests and it works properly with
both. I posted the patch to devel@xxxxxxxxxxxxxxxxx
https://www.spinics.net/linux/fedora/libvir/msg249203.html
and am hoping that others can also test it to verify that it's not
*breaking* dhcp for any other guests (I personally don't have much in
the way of Windows guest images, or debian/ubuntu/suse/etc. I could spin
some up but it would probably be faster (and less work for me!) if other
people just tested with what they have).