On Fri, Nov 26, 2021 at 1:04 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 26.11.21 03:52, Peter Xu wrote: > > On Thu, Nov 25, 2021 at 11:32:08AM +0100, David Hildenbrand wrote: > >> On 25.11.21 11:24, Peter Xu wrote: > >>> On Mon, Nov 22, 2021 at 10:40:54AM -0800, Shakeel Butt wrote: > >>>>> Do we have a performance evaluation how much overhead is added e.g., for > >>>>> a single 4k MADV_DONTNEED call on a THP or on a MADV_DONTNEED call that > >>>>> covers the whole THP? > >>>> > >>>> I did a simple benchmark of madvise(MADV_DONTNEED) on 10000 THPs on > >>>> x86 for both settings you suggested. I don't see any statistically > >>>> significant difference with and without the patch. Let me know if you > >>>> want me to try something else. > >>> > >>> I'm a bit surprised that sync split thp didn't bring any extra overhead. > >>> > >>> "unmap whole thp" is understandable from that pov, because afaict that won't > >>> even trigger any thp split anyway even delayed, if this is the simplest case > >>> that only this process mapped this thp, and it mapped once. > >>> > >>> For "unmap 4k upon thp" IIUC that's the worst case and zapping 4k should be > >>> fast; while what I don't understand since thp split requires all hand-made work > >>> for copying thp flags into small pages and so on, so I thought there should at > >>> least be some overhead measured. Shakeel, could there be something overlooked > >>> in the test, or maybe it's me that overlooked? > >>> > >>> I had the same concern as what Kirill/Matthew raised in the other thread - I'm > >>> worried proactively splitting simply because any 4k page is zapped might > >>> quickly free up 2m thps in the system and I'm not sure whether it'll exaggerate > >>> the defragmentation of the system memory in general. I'm also not sure whether > >>> that's ideal for some very common workload that frequently uses DONTNEED to > >>> proactively drop some pages. > >> > >> The pageblock corresponding to the THP is movable. So (unless we start > >> spilling unmovable allocations into movable pageblocks) we'd only place > >> movable allocations in there. Compaction will be able to migrate to > >> re-create a free THP. > >> > >> In contrast I think, compaction will happily skip over the THP and > >> ignore it, because it has no clue that the THP could be repurposed by > >> split+migrate (at least I am not aware of code that does it). > >> > >> Unless I am missing something, with the above in mind it could make > >> sense to split as soon as possible, even before we're under memory > >> pressure -- for example, for proactive compaction. > >> > >> [proactive compaction could try splitting first as well I think] > > > > But we can't rely on proactive compaction for rapid operations, because it's > > still adding overhead to the overall system by split+merge, right? > > Yes, but there is also direct compaction that can be triggered without > the shrinker getting involved. I think we can summarize as "there might > not be a right or wrong when to split". An application that > MADV_DONTNEEDs/munmap sub-THP memory told us that it doesn't want to > consume memory, yet it looks like it's still consuming that memory. > > I do wonder how THP on the deferred split queue behave in respect to > page migration -- memory offlining, alloc_contig_range(). I saw reports IIUC the old page that is on deferred split queue would be freed and off the list once the migration is done. But the new page is *NOT* added to the deferred split list at all. This would not cause any severe bugs but the partial unmapped pages can no longer be shrunk under memory pressure. > that there are some cases where THP can be problematic when > stress-testing THP: > https://lkml.kernel.org/r/20210903162102.GA10039@mam-gdavis-dt > > But not sure if that's related to deferred splitting. Most probably not. > > > > > +compaction_proactiveness > > +======================== > > + ... > > +Note that compaction has a non-trivial system-wide impact as pages > > +belonging to different processes are moved around, which could also lead > > +to latency spikes in unsuspecting applications. The kernel employs > > +various heuristics to avoid wasting CPU cycles if it detects that > > +proactive compaction is not being effective. > > > > Delaying split makes sense to me because after all the kernel is not aware of > > the userspace's preference, so the best thing is to do nothing until necessary. > > > > Proactively split thps in dontneed/unmap added an assumption that the userspace > > wants to break the pages by default. It's 100% true for Shakeel's use case, > > but I'm not sure whether it may always be true. That's why I thought maybe a > > new interface is more proper, so we at least won't break anyone by accident. > > Well, we already broke the PMD into PTEs. So the performance gain at > least for that user is really gone until we "fix that" again via > khugepaged -- which might just be configured to not "fix" if there are > empty PTEs. > > It for sure is interesting if you have a COW huge page and only one > party zaps/unmaps some part. > > > -- > Thanks, > > David / dhildenb >