On Wed, Oct 19, 2022 at 11:55 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > On 10/19/22 11:31, Yang Shi wrote: > > On Tue, Oct 18, 2022 at 1:01 PM James Houghton <jthoughton@xxxxxxxxxx> wrote: > > > > > > This change is very similar to the change that was made for shmem [1], > > > and it solves the same problem but for HugeTLBFS instead. > > > > > > Currently, when poison is found in a HugeTLB page, the page is removed > > > from the page cache. That means that attempting to map or read that > > > hugepage in the future will result in a new hugepage being allocated > > > instead of notifying the user that the page was poisoned. As [1] states, > > > this is effectively memory corruption. > > > > > > The fix is to leave the page in the page cache. If the user attempts to > > > use a poisoned HugeTLB page with a syscall, the syscall will fail with > > > EIO, the same error code that shmem uses. For attempts to map the page, > > > the thread will get a BUS_MCEERR_AR SIGBUS. > > > > > > [1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens") > > > > > > Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx> > > > > Thanks for the patch. Yes, we should do the same thing for hugetlbfs. > > When I was working on shmem I did look into hugetlbfs too. But the > > problem is we actually make the whole hugetlb page unavailable even > > though just one 4K sub page is hwpoisoned. It may be fine to 2M > > hugetlb page, but a lot of memory may be a huge waste for 1G hugetlb > > page, particular for the page fault path. > > One thing that complicated this a bit is the vmemmap optimizations for > hugetlb. However, I believe Naoya may have addressed this recently. > > > So I discussed this with Mike offline last year, and I was told Google > > was working on PTE mapped hugetlb page. That should be able to solve > > the problem. And we'd like to have the high-granularity hugetlb > > mapping support as the predecessor. > > Yes, I went back in my notes and noticed it had been one year. No offense > intended to James and his great work on HGM. However, in hindsight we should > have fixed this in some way without waiting for a HGM based. > > > There were some other details, but I can't remember all of them, I > > have to refresh my memory by rereading the email discussions... > > I think the complicating factor was vmemmap optimization. As mentioned > above, this may have already been addressed by Naoya in patches to > indicate which sub-page(s) had the actual error. > > As Yang Shi notes, this patch makes the entire hugetlb page inaccessible. > With some work, we could allow reads to everything but the sub-page with > error. However, this should be much easier with HGM. And, we could > potentially even do page faults everywhere but the sub-page with error. > > I still think it may be better to wait for HGM instead of trying to do > read access to all but sub-page with error now. But, entirely open to > other opinions. I have no strong preference about which goes first. > > I plan to do a review of this patch a little later. > -- > Mike Kravetz