On Wed, 12 Feb 2025 17:55:18 +0100 David Hildenbrand <david@xxxxxxxxxx> wrote: > On 31.01.25 12:24, Claudio Imbrenda wrote: > > Move gmap related functions from kernel/uv into kvm. > > > > Create a new file to collect gmap-related functions. > > > > Reviewed-by: Janosch Frank <frankja@xxxxxxxxxxxxx> > > Reviewed-by: Christoph Schlameuss <schlameuss@xxxxxxxxxxxxx> > > [fixed unpack_one(), thanks mhartmay@xxxxxxxxxxxxx] > > Link: https://lore.kernel.org/r/20250123144627.312456-6-imbrenda@xxxxxxxxxxxxx > > Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> > > Message-ID: <20250123144627.312456-6-imbrenda@xxxxxxxxxxxxx> > > --- > > This patch breaks large folio splitting because you end up un-refing > the wrong folios after a split; I tried to make it work, but either > because of other changes in this patch (or in others), I > cannot get it to work and have to give up for today. yes, I had also noticed that and I already have a fix ready. In fact my fix was exactly like yours, except that I did not pass the struct folio anymore to kvm_s390_wiggle_split_folio(), but instead I only pass a page and use page_folio() at the beginning, and I use split_huge_page_to_list_to_order() directly instead of split_folio() unfortunately the fix does not fix the issue I'm seeing.... but putting printks everywhere seems to solve the issue, so it seems to be a race somewhere > > Running a simple VM backed by memory-backend-memfd > (which now uses large folios) no longer works (random refcount underflows / > freeing of wrong pages). > > > > The following should be required (but according to my testing insufficient): > > From 71fafff5183c637f20830f6f346e8c9f3eafeb59 Mon Sep 17 00:00:00 2001 > From: David Hildenbrand <david@xxxxxxxxxx> > Date: Wed, 12 Feb 2025 16:00:32 +0100 > Subject: [PATCH] KVM: s390: fix splitting of large folios > > If we end up splitting the large folio, doing a put_page() will drop the > wrong reference (unless it was the head page), because we are holding a > reference to the old (large) folio. Similarly, doing another > page_folio() after the split is wrong. > > The result is that we end up freeing a page that is still mapped+used. > > To fix it, let's pass the page and call split_huge_page() instead. > > As an alternative, we could convert all code to use folios, and to > look up the page again from the page table after our split; however, in > context of non-uniform folio splits [1], it make sense to pass the page > where we really want to split. > > [1] https://lkml.kernel.org/r/20250211155034.268962-1-ziy@xxxxxxxxxx > > Fixes: 5cbe24350b7d ("KVM: s390: move pv gmap functions into kvm") > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > --- > arch/s390/include/asm/gmap.h | 3 ++- > arch/s390/kvm/gmap.c | 4 ++-- > arch/s390/mm/gmap.c | 13 +++++++++++-- > 3 files changed, 15 insertions(+), 5 deletions(-) > > diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h > index 4e73ef46d4b2a..0efa087778135 100644 > --- a/arch/s390/include/asm/gmap.h > +++ b/arch/s390/include/asm/gmap.h > @@ -139,7 +139,8 @@ int s390_replace_asce(struct gmap *gmap); > void s390_uv_destroy_pfns(unsigned long count, unsigned long *pfns); > int __s390_uv_destroy_range(struct mm_struct *mm, unsigned long start, > unsigned long end, bool interruptible); > -int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool split); > +int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, > + struct page *page, bool split); > unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int level); > > /** > diff --git a/arch/s390/kvm/gmap.c b/arch/s390/kvm/gmap.c > index 02adf151d4de4..c2523c63afea3 100644 > --- a/arch/s390/kvm/gmap.c > +++ b/arch/s390/kvm/gmap.c > @@ -72,7 +72,7 @@ static int __gmap_make_secure(struct gmap *gmap, struct page *page, void *uvcb) > return -EFAULT; > if (folio_test_large(folio)) { > mmap_read_unlock(gmap->mm); > - rc = kvm_s390_wiggle_split_folio(gmap->mm, folio, true); > + rc = kvm_s390_wiggle_split_folio(gmap->mm, folio, page, true); > mmap_read_lock(gmap->mm); > if (rc) > return rc; > @@ -100,7 +100,7 @@ static int __gmap_make_secure(struct gmap *gmap, struct page *page, void *uvcb) > /* The folio has too many references, try to shake some off */ > if (rc == -EBUSY) { > mmap_read_unlock(gmap->mm); > - kvm_s390_wiggle_split_folio(gmap->mm, folio, false); > + kvm_s390_wiggle_split_folio(gmap->mm, folio, page, false); > mmap_read_lock(gmap->mm); > return -EAGAIN; > } > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c > index 94d9277858009..3180ad90a255a 100644 > --- a/arch/s390/mm/gmap.c > +++ b/arch/s390/mm/gmap.c > @@ -2631,12 +2631,18 @@ EXPORT_SYMBOL_GPL(s390_replace_asce); > * kvm_s390_wiggle_split_folio() - try to drain extra references to a folio and optionally split > * @mm: the mm containing the folio to work on > * @folio: the folio > + * @page: the folio page where to split the folio > * @split: whether to split a large folio > * > + * If a split of a large folio was requested, the original provided folio must > + * no longer be used if this function returns 0. The new folio must be looked > + * up using page_folio(), to which we will then hold a reference. > + * > * Context: Must be called while holding an extra reference to the folio; > * the mm lock should not be held. > */ > -int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool split) > +int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, > + struct page *page, bool split) > { > int rc; > > @@ -2645,7 +2651,10 @@ int kvm_s390_wiggle_split_folio(struct mm_struct *mm, struct folio *folio, bool > lru_add_drain_all(); > if (split) { > folio_lock(folio); > - rc = split_folio(folio); > + /* Careful: split_folio() would be wrong. */ > + rc = split_huge_page(page); > + if (!rc) > + folio = page_folio(page); > folio_unlock(folio); > > if (rc != -EBUSY)