On Fri, Nov 09, 2018 at 10:34:07AM -0500, Zi Yan wrote: > On 9 Nov 2018, at 8:11, Mel Gorman wrote: > > > On Fri, Nov 09, 2018 at 03:13:18PM +0300, Kirill A. Shutemov wrote: > >> On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote: > >>> The basic idea as outlined by Mel Gorman in [2] is: > >>> > >>> 1) On first fault in a sufficiently sized range, allocate a huge page > >>> sized and aligned block of base pages. Map the base page > >>> corresponding to the fault address and hold the rest of the pages in > >>> reserve. > >>> 2) On subsequent faults in the range, map the pages from the reservation. > >>> 3) When enough pages have been mapped, promote the mapped pages and > >>> remaining pages in the reservation to a huge page. > >>> 4) When there is memory pressure, release the unused pages from their > >>> reservations. > >> > >> I haven't yet read the patch in details, but I'm skeptical about the > >> approach in general for few reasons: > >> > >> - PTE page table retracting to replace it with huge PMD entry requires > >> down_write(mmap_sem). It makes the approach not practical for many > >> multi-threaded workloads. > >> > >> I don't see a way to avoid exclusive lock here. I will be glad to > >> be proved otherwise. > >> > > > > That problem is somewhat fundamental to the mmap_sem itself and > > conceivably it could be alleviated by range-locking (if that gets > > completed). The other thing to bear in mind is the timing. If the > > promotion is in-place due to reservations, there isn't the allocation > > overhead and the hold times *should* be short. > > > > Is it possible to convert all these PTEs to migration entries during > the promotion and replace them with a huge PMD entry afterwards? > AFAIK, migrating pages does not require holding a mmap_sem. > Basically, it will act like migrating 512 base pages to a THP without > actually doing the page copy. You'll still need down_write(mmap_sem) to convert PTE page table full of migration entires to PMD entry. It's required at least to protect against parallel MADV_DONTNEED that can zap migration entries under you. -- Kirill A. Shutemov