On 9 Nov 2018, at 8:11, Mel Gorman wrote: > On Fri, Nov 09, 2018 at 03:13:18PM +0300, Kirill A. Shutemov wrote: >> On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote: >>> The basic idea as outlined by Mel Gorman in [2] is: >>> >>> 1) On first fault in a sufficiently sized range, allocate a huge page >>> sized and aligned block of base pages. Map the base page >>> corresponding to the fault address and hold the rest of the pages in >>> reserve. >>> 2) On subsequent faults in the range, map the pages from the reservation. >>> 3) When enough pages have been mapped, promote the mapped pages and >>> remaining pages in the reservation to a huge page. >>> 4) When there is memory pressure, release the unused pages from their >>> reservations. >> >> I haven't yet read the patch in details, but I'm skeptical about the >> approach in general for few reasons: >> >> - PTE page table retracting to replace it with huge PMD entry requires >> down_write(mmap_sem). It makes the approach not practical for many >> multi-threaded workloads. >> >> I don't see a way to avoid exclusive lock here. I will be glad to >> be proved otherwise. >> > > That problem is somewhat fundamental to the mmap_sem itself and > conceivably it could be alleviated by range-locking (if that gets > completed). The other thing to bear in mind is the timing. If the > promotion is in-place due to reservations, there isn't the allocation > overhead and the hold times *should* be short. > Is it possible to convert all these PTEs to migration entries during the promotion and replace them with a huge PMD entry afterwards? AFAIK, migrating pages does not require holding a mmap_sem. Basically, it will act like migrating 512 base pages to a THP without actually doing the page copy. -- Best Regards Yan Zi
Attachment:
signature.asc
Description: OpenPGP digital signature