On Tue, 10 Mar 2015, Peter Zijlstra wrote: > On Mon, Mar 09, 2015 at 04:48:43PM -0400, Eric B Munson wrote: > > Currently, pages which are marked as unevictable are protected from > > compaction, but not from other types of migration. The mlock > > desctription does not promise that all page faults will be avoided, only > > major ones so this protection is not necessary. This extra protection > > can cause problems for applications that are using mlock to avoid > > swapping pages out, but require order > 0 allocations to continue to > > succeed in a fragmented environment. This patch removes the > > ISOLATE_UNEVICTABLE mode and the check for it in __isolate_lru_page(). > > Removing this check allows the removal of the isolate_mode argument from > > isolate_migratepages_block() because it can compute the required mode > > from the compact_control structure. > > > > To illustrate this problem I wrote a quick test program that mmaps a > > large number of 1MB files filled with random data. These maps are > > created locked and read only. Then every other mmap is unmapped and I > > attempt to allocate huge pages to the static huge page pool. Without > > this patch I am unable to allocate any huge pages after fragmenting > > memory. With it, I can allocate almost all the space freed by unmapping > > as huge pages. > > So mlock() is part of the POSIX real-time spec. For real-time purposes > we very much do _NOT_ want page migration to happen. > > So while you might be following the letter of the spec you're very much > violating the spirit of the thing. > Fair enough, but the documentation in the mlock manpage only explicitly promises to prevent major faults. If this patch is not taken, then the manpage for mlock needs to have a note added explaining that mlock prevents compaction as well. The confusion our userspace devs had stems from this as they though they could use mlock to avoid swapping, but still benefit from compaction in order > 0 allocations. > Also, there is another solution to your problem; you can compact > mlock'ed pages at mlock() time. This might work for some cases, I'd have to spend some time thinking on it, but it won't work in my case. Memory is fragmented by unmapping as data is no longer needed. So we really do need to compact the locked pages that are left. > > Furthermore, I would once again like to remind people of my VM_PINNED > patches. The only thing that needs happening there is someone needs to > deobfuscate the IB code. Hence my attempt to kick that discussion last week. Unfortunately, I cannot provide any help with the IB code. Having this mechanism would give us a way to continue to allow real-time users to avoid all faults while giving anyone that wants to avoid only major faults a way to do so.
Attachment:
signature.asc
Description: Digital signature