Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07.06.23 09:51, Yosry Ahmed wrote:
On Wed, Jun 7, 2023 at 12:38 AM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 07.06.23 00:40, David Rientjes wrote:
On Fri, 2 Jun 2023, Mike Kravetz wrote:

The benefit of HGM in the case of memory errors is fairly obvious.  As
mentioned above, when a memory error is encountered on a hugetlb page,
that entire hugetlb page becomes inaccessible to the application.  Losing,
1G or even 2M of data is often catastrophic for an application.  There
is often no way to recover.  It just makes sense that recovering from
the loss of 4K of data would generally be easier and more likely to be
possible.  Today, when Oracle DB encounters a hard memory error on a
hugetlb page it will shutdown.  Plans are currently in place repair and
recover from such errors if possible.  Isolating the area of data loss
to a single 4K page significantly increases the likelihood of repair and
recovery.

Today, when a memory error is encountered on a hugetlb page an
application is 'notified' of the error by a SIGBUS, as well as the
virtual address of the hugetlb page and it's size.  This makes sense as
hugetlb pages are accessed by a single page table entry, so you get all
or nothing.  As mentioned by James above, this is catastrophic for VMs
as the hypervisor has just been told that 2M or 1G is now inaccessible.
With HGM, we can isolate such errors to 4K.

Backing VMs with hugetlb pages is a real use case today.  We are seeing
memory errors on such hugetlb pages with the result being VM failures.
One of the advantages of backing VMs with THPs is that they are split in
the case of memory errors.  HGM would allow similar functionality.

Thanks for this context, Mike, it's very useful.

I think everybody is aligned on the desire to map memory at smaller
granularities for multiple use cases and it's fairly clear that these use
cases are critically important to multiple stakeholders.

I think the open question is whether this functionality is supported in
hugetlbfs (like with HGM) or that there is a hard requirement that we must
use THP for this support.

I don't think that hugetlbfs is feature frozen, but if there's a strong
bias toward not merging additional complexity into the subsystem that
would useful to know.  I personally think the critical use cases described

At least I, attending that session, thought that it was clear that the
majority of the people speaking up clearly expressed "no more added
complexity". So I think there is a clear strong bias, at least from the
people attending that session.


above justify the added complexity of HGM to hugetlb and we wouldn't be
blocked by the long standing (15+ years) desire to mesh hugetlb into the
core MM subsystem before we can stop the pain associated with memory
poisoning and live migration.

Are there strong objections to extending hugetlb for this support?

I don't want to get too involved in this discussion (busy), but I
absolutely agree on the points that were raised at LSF/MM that

(A) hugetlb is complicated and very special (many things not integrated
with core-mm, so we need special-casing all over the place). [example:
what is a pte?]

(B) We added a bunch of complexity in the past that some people
considered very important (and it was not feature frozen, right? ;) ).
Looking back, we might just not have done some of that, or done it
differently/cleaner -- better integrated in the core. (PMD sharing,
MAP_PRIVATE, a reservation mechanism that still requires preallocation
because it fails with NUMA/fork, ...)

(C) Unifying hugetlb and the core looks like it's getting more and more
out of reach, maybe even impossible with all the complexity we added
over the years (well, and keep adding).

Sure, HGM for the purpose of better hwpoison handling makes sense. But
hugetlb is probably 20 years old and hwpoison handling probably 13 years
old. So we managed to get quite far without that optimization.

Absolutely, HGM for better postcopy live migration also makes sense, I
guess nobody disagrees on that.


But as discussed in that session, maybe we should just start anew and
implement something that integrates nicely with the core , instead of
making hugetlb more complicated and even more special.


Now, we all know, nobody wants to do the heavy lifting for that, that's
why we're discussing how to get in yet another complicated feature.

If nobody wants to do the heavy lifting and unifying hugetlb with core
MM is becoming impossible as you state, then does adding another
feature to hugetlb (that we are all agreeing is useful for multiple
use cases) really making things worse? In other words, if someone

Well, if we (as a community) reject more complexity and outline an alternative of what would be acceptable (rewrite), people that really want these new features will *have to* do the heavy lifting.

[and I see many people from employers that might have the capacity to do the heavy lifting if really required being involved in the discussion around HGM :P ]

decides tomorrow to do the heavy lifting, how much harder does this
become because of HGM, if any?

I am the farthest away from being an expert here, I am just an
observer here, but if the answer to the above question is "HGM doesn't
actually make it worse" or "HGM only slightly makes things harder",
then I naively think that it's something that we should do, from a
pure cost-benefit analysis.

Well, there is always the "maintainability" aspect, because upstream has to maintain whatever complexity gets merged. No matter what, we'll have to keep maintaining the current set of hugetlb features until we can eventually deprecate it/some in the far, far future.

I, for my part, am happy as long as I can stay away as far as possible from hugetlb code. Again, Mike is the maintainer.

What I saw so far regarding HGM does not count as "slightly makes things harder".


Again, I don't have a lot of context here, and I understand everyone's
frustration with the current state of hugetlb. Just my 2 cents.

The thing is, we all agree that something that hugetlb provides is valuable (i.e., pool of huge/large pages that we can map large), just that after 20 years there might be better ways of doing it and integrating it better with core-mm.

Yes, many people are frustrated with the current state. Adding more complexity won't improve things.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux