Re: [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default"

"Linux regression tracking (Thorsten Leemhuis)" <regressions@xxxxxxxxxxxxx> · Thu, 15 Aug 2024 09:14:27 +0200

[side note: the message I have been replying to at least when downloaded
from lore has two message-ids, one of them identical two a older
message, which is why this looks odd in the lore archives:
https://lore.kernel.org/all/20240511031404.30903-1-xuanzhuo@xxxxxxxxxxxxxxxxx/]

On 14.08.24 08:59, Michael S. Tsirkin wrote:
> Note: Xuan Zhuo, if you have a better idea, pls post an alternative
> patch.
> 
> Note2: untested, posting for Darren to help with testing.
> 
> Turns out unconditionally enabling premapped 
> virtio-net leads to a regression on VM with no ACCESS_PLATFORM, and with
> sysctl net.core.high_order_alloc_disable=1
> 
> where crashes and scp failures were reported (scp a file 100M in size to VM):
> [...]

TWIMC, there is a regression report on lore and I wonder if this might
be related or the same problem, as it also mentioned a "get_swap_device:
Bad swap file entry" error:
https://bugzilla.kernel.org/show_bug.cgi?id=219154

To quote:

"""
Hello,

I've encountered repeated crashes or freezes when a KVM VM receives
large amounts of data over the network while the system is under memory
load and performing I/O operations. The crashes sometimes occur in the
filesystem code (ext4 and btrfs, at least), but they also happen in
other locations.

This issue occurs on my custom builds using kernel versions v6.10 to
v6.11-rc2, with virtio network and disk drivers, and either Ubuntu 22.04
or Debian 12 user space.

The same kernel build did not crash on an Azure VM, which does not use
the virtio network driver. Since this issue only appears when receiving
data, I suspect there could be an issue related to the virtio interface
or receive buffer handling.

This issue did not occur on the Debian backport kernel 6.9.7-1~bpo12+1
amd64.

Steps to Reproduce:
1. Setup a small VM on a KVM host.
   I tested this on an x86_64 KVM VM with 1 CPU, 512 MB RAM, 2 GB SWAP
(the smallest configuration from Vultr), using a Debian 12 user space,
virtio disk, and virtio net.
2. Induce high memory and I/O load. Run the following command:
   stress --vm 2 --hdd 1
   (Adjust --vm to to occupy all the RAM)
   This slows down the system but does not cause a crash.
3. Send large data to the VM.
   I used `iperf3 -s` on the VM and sent data using `iperf3 -c` from
another host. The system crashes within a few seconds to a few minutes.
(The reverse direction `iperf3 -c -R` did not cause a crash.)

The OOPS messages are mostly general protection faults, but sometimes I
see "Bad pagetable" or other errors, such as:
Oops: general protection fault, probably for non-canonical address
0x2f9b7fa5e2bde696: 0000 [#1] PREEMPT SMP PTI
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
Oops: Bad pagetable: 000d [#1] PREEMPT SMP PTI

In some cases, dmesg contains something like:
UBSAN: shift-out-of-bounds in lib/xarray.c:158:34

When the system freezes without crash, I sometimes found BUGON messages
in some cases, such as:
get_swap_device: Bad swap file entry 3403b0f5b2584992
BUG: Bad page map in process stress  pte:c42f93fac0299e1d pmd:0d9b2047
BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_ANONPAGES val:2
BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_SWAPENTS val:-1

Thanks.
"""

Ciao, Thorsten