On Fri, Feb 26, 2016 at 01:32:53PM +0300, Kirill A. Shutemov wrote: > Could you elaborate on problems with rmap? I have looked into this deeply > yet. > > Do you see anything what would prevent following basic scheme: > > - Identify series of small pages as candidate for collapsing into > a compound page. Not sure how difficult it would be. I guess it can be > done by looking for adjacent pages which belong to the same anon_vma. Just like if there was no other process sharing them yes. > - Setup migration entries for pte which maps these pages. > > > - Collapse small pages into compound page. IIUC, it only will be possible > if these pages are not pinned. > > - Replace migration entries with ptes which point to subpages of the new > compound page. > > - Scan over all vmas mapping this compound page, looking for VMA suitable > for huge page. We cannot collapse it right away due lock inversion of > anon_vma->rwsem vs. mmap_sem. > > - For found VMAs, collapse page table into PMD one VMA a time under > down_write(mmap_sem). > > Even if would fail to create any PMDs, we would reduce LRU pressure by > collapsing small pages into compound one. I see how your new refcounting simplifies things as we don't have to do create hugepmds immediately, but we still have to modify all ptes of all sharers, not just those belonging to the vma we collapsed (or we'd be effectively copying-on-collapse in turn losing the sharing). If we'd defer it and leave temporarily new THP and old 4k pages both allocated and independently mapped, a process running in the old ptes could gup_fast and a process in the new ptes could gup_fast too and we'd up with double memory usage, so we'd need a way to redirect gup_fast in the old pte to the new THP, so the future pins goes to the new THP always. Some new linkage between old ptes and new ptes would also be needed to keep walking it slowly and it shall be invalidated during COWs. Doing it incrementally and not updating all ptes at once wouldn't be straightforward. Doing it not incrementally would mean paying the cost of updating (in the worst case) up to hundred thousand ptes at full CPU usage for a later gain we're not sure about. Said that I think it's worthy goal to achieve, especially if we remove compaction from direct reclaim. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>