On Thu, Dec 07, 2023 at 02:04:26PM -0700, Jens Axboe wrote: > On Mon, 14 Aug 2023 15:41:00 +0100, Matthew Wilcox (Oracle) wrote: >> The special casing was originally added in pre-git history; reproducing >> the commit log here: >> >>> commit a318a92567d77 >>> Author: Andrew Morton <akpm@xxxxxxxx> >>> Date: Sun Sep 21 01:42:22 2003 -0700 >>> >>> [PATCH] Speed up direct-io hugetlbpage handling >>> >>> This patch short-circuits all the direct-io page dirtying logic for >>> higher-order pages. Without this, we pointlessly bounce BIOs up to >>> keventd all the time. >> >> [...] > > Applied, thanks! > > [1/1] block: Remove special-casing of compound pages > commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55 This commit results in a change of behavior for QEMU VMs backed by hugepages that open their VM disk image file with O_DIRECT (QEMU cache=none or cache.direct=on options). When the VM shuts down and the QEMU process exits, one or two hugepages may fail to free correctly. It appears to be a race, as it doesn't happen every time. >From debugging on 6.8-rc6, when it occurs, the hugepage that fails to free has a non-zero refcount when it hits the folio_put_testzero(folio) test in release_pages(). On a failure test iteration with 1 GiB hugepages, the failing folio had a mapcount of 0, refcount of 35, and folio_maybe_dma_pinned was true. The problem only occurs when the VM disk image file is opened with O_DIRECT. When using QEMU cache=writeback or cache.direct=off options, it does not occur. We first noticed it on the 6.1.y stable kernel when this commit landed there (6.1.75). A very simple reproducer without KVM (just boot VM up, then shut it down): echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages qemu-system-x86_64 \ -cpu qemu64 \ -m 1024 \ -nographic \ -mem-path /dev/hugepages/vm00 \ -mem-prealloc \ -drive file=test.qcow2,if=none,cache=none,id=drive0 \ -device virtio-blk-pci,drive=drive0,id=disk0,bootindex=1 rm -f /dev/hugepages/vm00 Some testing notes: * occurs with 6.1.75, 6.6.14, 6.8-rc6, and linux-next-20240229 * occurs with 1 GiB and 2 MiB huge pages, with both hugetlbfs and memfd * occurs with QEMU 8.0.y, 8.1.y, 8.2.y, and master * occurs with (-enable-kvm -cpu host) or without (-cpu qemu64) KVM Thanks for your time! Greg