Re: [PATCH v6 1/2] mm: migration: fix the FOLL_GET failure on following huge page

Michael Ellerman <mpe@xxxxxxxxxxxxxx> · Fri, 26 Aug 2022 23:07:12 +1000

Mike Kravetz <mike.kravetz@xxxxxxxxxx> writes:
> On 08/19/22 21:22, Michael Ellerman wrote:
>> Mike Kravetz <mike.kravetz@xxxxxxxxxx> writes:
>> > On 08/16/22 22:43, Andrew Morton wrote:
>> >> On Wed, 17 Aug 2022 03:31:37 +0000 "Wang, Haiyue" <haiyue.wang@xxxxxxxxx> wrote:
>> >>
>> >> > > >  		}
>> >> > >
>> >> > > I would be better to fix this for real at those three client code sites?
>> >> >
>> >> > Then 5.19 will break for a while to wait for the final BIG patch ?
>> >>
>> >> If that's the proposal then your [1/2] should have had a cc:stable and
>> >> changelog words describing the plan for 6.0.
>> >>
>> >> But before we do that I'd like to see at least a prototype of the final
>> >> fixes to s390 and hugetlb, so we can assess those as preferable for
>> >> backporting.  I don't think they'll be terribly intrusive or risky?
>> >
>> > I will start on adding follow_huge_pgd() support.  Although, I may need
>> > some help with verification from the powerpc folks, as that is the only
>> > architecture which supports hugetlb pages at that level.
>> >
>> > mpe any suggestions?
>>
>> I'm happy to test.
>>
>> I have a system where I can allocate 1GB huge pages.
>>
>> I'm not sure how to actually test this path though. I hacked up the
>> vm/migration.c test to allocate 1GB hugepages, but I can't see it going
>> through follow_huge_pgd() (using ftrace).
>
> I thing you needed to use 16GB to trigger this code path.  Anshuman introduced
> support for page offline (and migration) at this level in commit 94310cbcaa3c
> ("mm/madvise: enable (soft|hard) offline of HugeTLB pages at PGD level").
> When asked about the use case, he mentioned:
>
> "Yes, its in the context of 16GB pages on POWER8 system where all the
>  gigantic pages are pre allocated from the platform and passed on to
>  the kernel through the device tree. We dont allocate these gigantic
>  pages on runtime."

That was true, but isn't anymore.

I must have been insufficently caffeinated the other day. On our newer
machines 1GB is the largest huge page size, but it's obviously way too
small to sit at the PGD level. So that was a waste of my time :)

We used to support 16GB at the PGD level, but we reworked the page table
geometry a few years ago, and now they sit at the PUD level on machines
that support 16GB pages:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba95b5d0359609b4ec8010f77c40ab3c595a6ac6

Note the author :}

So the good news is we no longer have any configuration where a huge
page entry is expected in the PGD. So we can drop our pgd_huge()
definitions, and ours are the last non-zero definitions, so it can all
go away I think.

I'll send a patch to remove the powerpc pgd_huge() definitions after
I've run it through some tests.

cheers