On 11/09/2018 07:34 AM, Zi Yan wrote: > On 9 Nov 2018, at 8:11, Mel Gorman wrote: > >> On Fri, Nov 09, 2018 at 03:13:18PM +0300, Kirill A. Shutemov wrote: >>> On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote: >>>> The basic idea as outlined by Mel Gorman in [2] is: >>>> >>>> 1) On first fault in a sufficiently sized range, allocate a huge page >>>> sized and aligned block of base pages. Map the base page >>>> corresponding to the fault address and hold the rest of the pages in >>>> reserve. >>>> 2) On subsequent faults in the range, map the pages from the reservation. >>>> 3) When enough pages have been mapped, promote the mapped pages and >>>> remaining pages in the reservation to a huge page. >>>> 4) When there is memory pressure, release the unused pages from their >>>> reservations. >>> I haven't yet read the patch in details, but I'm skeptical about the >>> approach in general for few reasons: >>> >>> - PTE page table retracting to replace it with huge PMD entry requires >>> down_write(mmap_sem). It makes the approach not practical for many >>> multi-threaded workloads. >>> >>> I don't see a way to avoid exclusive lock here. I will be glad to >>> be proved otherwise. >>> >> That problem is somewhat fundamental to the mmap_sem itself and >> conceivably it could be alleviated by range-locking (if that gets >> completed). The other thing to bear in mind is the timing. If the >> promotion is in-place due to reservations, there isn't the allocation >> overhead and the hold times *should* be short. >> > Is it possible to convert all these PTEs to migration entries during > the promotion and replace them with a huge PMD entry afterwards? > AFAIK, migrating pages does not require holding a mmap_sem. > Basically, it will act like migrating 512 base pages to a THP without > actually doing the page copy. That's an interesting idea. I'll look into it. Thanks, Anthony > > -- > Best Regards > Yan Zi