Re: [PATCH] hugetlbfs: don't delete error page from pagecache

Yang Shi <shy828301@xxxxxxxxx> · Thu, 20 Oct 2022 11:42:36 -0700

On Wed, Oct 19, 2022 at 11:55 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> On 10/19/22 11:31, Yang Shi wrote:
> > On Tue, Oct 18, 2022 at 1:01 PM James Houghton <jthoughton@xxxxxxxxxx> wrote:
> > >
> > > This change is very similar to the change that was made for shmem [1],
> > > and it solves the same problem but for HugeTLBFS instead.
> > >
> > > Currently, when poison is found in a HugeTLB page, the page is removed
> > > from the page cache. That means that attempting to map or read that
> > > hugepage in the future will result in a new hugepage being allocated
> > > instead of notifying the user that the page was poisoned. As [1] states,
> > > this is effectively memory corruption.
> > >
> > > The fix is to leave the page in the page cache. If the user attempts to
> > > use a poisoned HugeTLB page with a syscall, the syscall will fail with
> > > EIO, the same error code that shmem uses. For attempts to map the page,
> > > the thread will get a BUS_MCEERR_AR SIGBUS.
> > >
> > > [1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
> > >
> > > Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx>
> >
> > Thanks for the patch. Yes, we should do the same thing for hugetlbfs.
> > When I was working on shmem I did look into hugetlbfs too. But the
> > problem is we actually make the whole hugetlb page unavailable even
> > though just one 4K sub page is hwpoisoned. It may be fine to 2M
> > hugetlb page, but a lot of memory may be a huge waste for 1G hugetlb
> > page, particular for the page fault path.
>
> One thing that complicated this a bit is the vmemmap optimizations for
> hugetlb.  However, I believe Naoya may have addressed this recently.
>
> > So I discussed this with Mike offline last year, and I was told Google
> > was working on PTE mapped hugetlb page. That should be able to solve
> > the problem. And we'd like to have the high-granularity hugetlb
> > mapping support as the predecessor.
>
> Yes, I went back in my notes and noticed it had been one year.  No offense
> intended to James and his great work on HGM.  However, in hindsight we should
> have fixed this in some way without waiting for a HGM based.
>
> > There were some other details, but I can't remember all of them, I
> > have to refresh my memory by rereading the email discussions...
>
> I think the complicating factor was vmemmap optimization.  As mentioned
> above, this may have already been addressed by Naoya in patches to
> indicate which sub-page(s) had the actual error.
>
> As Yang Shi notes, this patch makes the entire hugetlb page inaccessible.
> With some work, we could allow reads to everything but the sub-page with
> error.  However, this should be much easier with HGM.  And, we could
> potentially even do page faults everywhere but the sub-page with error.
>
> I still think it may be better to wait for HGM instead of trying to do
> read access to all but sub-page with error now.  But, entirely open to
> other opinions.

I have no strong preference about which goes first.

>
> I plan to do a review of this patch a little later.
> --
> Mike Kravetz