On 31.01.25 12:24, Claudio Imbrenda wrote:
Move gmap related functions from kernel/uv into kvm.
Create a new file to collect gmap-related functions.
Reviewed-by: Janosch Frank <frankja@xxxxxxxxxxxxx>
Reviewed-by: Christoph Schlameuss <schlameuss@xxxxxxxxxxxxx>
[fixed unpack_one(), thanks mhartmay@xxxxxxxxxxxxx]
Link: https://lore.kernel.org/r/20250123144627.312456-6-imbrenda@xxxxxxxxxxxxx
Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
Message-ID: <20250123144627.312456-6-imbrenda@xxxxxxxxxxxxx>
---
This patch breaks large folio splitting because you end up un-refing
the wrong folios after a split; I tried to make it work, but either
because of other changes in this patch (or in others), I
cannot get it to work and have to give up for today.
Running a simple VM backed by memory-backend-memfd
(which now uses large folios) no longer works (random refcount underflows /
freeing of wrong pages).
The following should be required (but according to my testing insufficient):
From 71fafff5183c637f20830f6f346e8c9f3eafeb59 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@xxxxxxxxxx>
Date: Wed, 12 Feb 2025 16:00:32 +0100
Subject: [PATCH] KVM: s390: fix splitting of large folios
If we end up splitting the large folio, doing a put_page() will drop the
wrong reference (unless it was the head page), because we are holding a
reference to the old (large) folio. Similarly, doing another
page_folio() after the split is wrong.
The result is that we end up freeing a page that is still mapped+used.
To fix it, let's pass the page and call split_huge_page() instead.
As an alternative, we could convert all code to use folios, and to
look up the page again from the page table after our split; however, in
context of non-uniform folio splits [1], it make sense to pass the page
where we really want to split.
[1] https://lkml.kernel.org/r/20250211155034.268962-1-ziy@xxxxxxxxxx
Fixes: 5cbe24350b7d ("KVM: s390: move pv gmap functions into kvm")
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
arch/s390/include/asm/gmap.h | 3 ++-
arch/s390/kvm/gmap.c | 4 ++--
arch/s390/mm/gmap.c | 13 +++++++++++--
3 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index 4e73ef46d4b2a..0efa087778135 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -139,7 +139,8 @@ int s390_replace_asce(struct gmap *gmap);
void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns);
int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start,
unsigned long end, bool interruptible);
-int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool split);
+int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio,
+ struct page *page, bool split);
unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int level);
/**
diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c
index 02adf151d4de4..c2523c63afea3 100644
--- a/arch/s390/kvm/gmap.c
+++ b/arch/s390/kvm/gmap.c
@@ -72,7 +72,7 @@ static int __gmap_make_secure(struct gmap *gmap, struct page *page, void *uvcb)
return -EFAULT;
if (folio_test_large(folio)) {
mmap_read_unlock(gmap->mm);
- rc = kvm_s390_wiggle_split_folio(gmap->mm, folio, true);
+ rc = kvm_s390_wiggle_split_folio(gmap->mm, folio, page, true);
mmap_read_lock(gmap->mm);
if (rc)
return rc;
@@ -100,7 +100,7 @@ static int __gmap_make_secure(struct gmap *gmap, struct page *page, void *uvcb)
/* The folio has too many references, try to shake some off */
if (rc == -EBUSY) {
mmap_read_unlock(gmap->mm);
- kvm_s390_wiggle_split_folio(gmap->mm, folio, false);
+ kvm_s390_wiggle_split_folio(gmap->mm, folio, page, false);
mmap_read_lock(gmap->mm);
return -EAGAIN;
}
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 94d9277858009..3180ad90a255a 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2631,12 +2631,18 @@ EXPORT_SYMBOL_GPL(s390_replace_asce);
* kvm_s390_wiggle_split_folio() - try to drain extra references to a folio and optionally split
* @mm: the mm containing the folio to work on
* @folio: the folio
+ * @page: the folio page where to split the folio
* @split: whether to split a large folio
*
+ * If a split of a large folio was requested, the original provided folio must
+ * no longer be used if this function returns 0. The new folio must be looked
+ * up using page_folio(), to which we will then hold a reference.
+ *
* Context: Must be called while holding an extra reference to the folio;
* the mm lock should not be held.
*/
-int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool split)
+int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio,
+ struct page *page, bool split)
{
int rc;
@@ -2645,7 +2651,10 @@ int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool
lru_add_drain_all();
if (split) {
folio_lock(folio);
- rc = split_folio(folio);
+ /* Careful: split_folio() would be wrong. */
+ rc = split_huge_page(page);
+ if (!rc)
+ folio = page_folio(page);
folio_unlock(folio);
if (rc != -EBUSY)
--
2.48.1
With that, the VM still starts to freeze at some point.
There are other issues with this patch: you are no longer holding the PTL
while doing the conversion, so concurrent GUP-slow is now possible while
converting pages.
The refcount freezing (that I previously raised as problematic) is now
*more wrong* than before.
You buried way too many non-subtle changes in this patch with a
two-line patch description, I'm afraid ...
--
Cheers,
David / dhildenb