On Thu, Aug 06, 2015 at 05:33:54PM +0200, Vlastimil Babka wrote: > On 07/29/2015 05:42 PM, Eric B Munson wrote: > >The cost of faulting in all memory to be locked can be very high when > >working with large mappings. If only portions of the mapping will be > >used this can incur a high penalty for locking. > > > >For the example of a large file, this is the usage pattern for a large > >statical language model (probably applies to other statical or graphical > >models as well). For the security example, any application transacting > >in data that cannot be swapped out (credit card data, medical records, > >etc). > > > >This patch introduces the ability to request that pages are not > >pre-faulted, but are placed on the unevictable LRU when they are finally > >faulted in. The VM_LOCKONFAULT flag will be used together with > >VM_LOCKED and has no effect when set without VM_LOCKED. Setting the > >VM_LOCKONFAULT flag for a VMA will cause pages faulted into that VMA to > >be added to the unevictable LRU when they are faulted or if they are > >already present, but will not cause any missing pages to be faulted in. > > > >Exposing this new lock state means that we cannot overload the meaning > >of the FOLL_POPULATE flag any longer. Prior to this patch it was used > >to mean that the VMA for a fault was locked. This means we need the > >new FOLL_MLOCK flag to communicate the locked state of a VMA. > >FOLL_POPULATE will now only control if the VMA should be populated and > >in the case of VM_LOCKONFAULT, it will not be set. > > > >Signed-off-by: Eric B Munson <emunson@xxxxxxxxxx> > >Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > >Cc: Michal Hocko <mhocko@xxxxxxx> > >Cc: Vlastimil Babka <vbabka@xxxxxxx> > >Cc: Jonathan Corbet <corbet@xxxxxxx> > >Cc: "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> > >Cc: linux-kernel@xxxxxxxxxxxxxxx > >Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx > >Cc: linux-mm@xxxxxxxxx > >Cc: linux-api@xxxxxxxxxxxxxxx > >--- > > drivers/gpu/drm/drm_vm.c | 8 +++++++- > > fs/proc/task_mmu.c | 1 + > > include/linux/mm.h | 2 ++ > > kernel/fork.c | 2 +- > > mm/debug.c | 1 + > > mm/gup.c | 10 ++++++++-- > > mm/huge_memory.c | 2 +- > > mm/hugetlb.c | 4 ++-- > > mm/mlock.c | 2 +- > > mm/mmap.c | 2 +- > > mm/rmap.c | 4 ++-- > > 11 files changed, 27 insertions(+), 11 deletions(-) > > > >diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c > >index aab49ee..103a5f6 100644 > >--- a/drivers/gpu/drm/drm_vm.c > >+++ b/drivers/gpu/drm/drm_vm.c > >@@ -699,9 +699,15 @@ int drm_vma_info(struct seq_file *m, void *data) > > (void *)(unsigned long)virt_to_phys(high_memory)); > > > > list_for_each_entry(pt, &dev->vmalist, head) { > >+ char lock_flag = '-'; > >+ > > vma = pt->vma; > > if (!vma) > > continue; > >+ if (vma->vm_flags & VM_LOCKONFAULT) > >+ lock_flag = 'f'; > >+ else if (vma->vm_flags & VM_LOCKED) > >+ lock_flag = 'l'; > > seq_printf(m, > > "\n%5d 0x%pK-0x%pK %c%c%c%c%c%c 0x%08lx000", > > pt->pid, > >@@ -710,7 +716,7 @@ int drm_vma_info(struct seq_file *m, void *data) > > vma->vm_flags & VM_WRITE ? 'w' : '-', > > vma->vm_flags & VM_EXEC ? 'x' : '-', > > vma->vm_flags & VM_MAYSHARE ? 's' : 'p', > >- vma->vm_flags & VM_LOCKED ? 'l' : '-', > >+ lock_flag, > > vma->vm_flags & VM_IO ? 'i' : '-', > > vma->vm_pgoff); > > > >diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > >index ca1e091..38d69fc 100644 > >--- a/fs/proc/task_mmu.c > >+++ b/fs/proc/task_mmu.c > >@@ -579,6 +579,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) > > This function has the following comment: > > Don't forget to update Documentation/ on changes. > > [...] > > >--- a/mm/gup.c > >+++ b/mm/gup.c > >@@ -92,7 +92,7 @@ retry: > > */ > > mark_page_accessed(page); > > } > >- if ((flags & FOLL_POPULATE) && (vma->vm_flags & VM_LOCKED)) { > >+ if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { > > /* > > * The preliminary mapping check is mainly to avoid the > > * pointless overhead of lock_page on the ZERO_PAGE > >@@ -265,6 +265,9 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, > > unsigned int fault_flags = 0; > > int ret; > > > >+ /* mlock all present pages, but do not fault in new pages */ > >+ if ((*flags & (FOLL_POPULATE | FOLL_MLOCK)) == FOLL_MLOCK) > >+ return -ENOENT; > > /* For mm_populate(), just skip the stack guard page. */ > > if ((*flags & FOLL_POPULATE) && > > (stack_guard_page_start(vma, address) || > >@@ -850,7 +853,10 @@ long populate_vma_page_range(struct vm_area_struct *vma, > > VM_BUG_ON_VMA(end > vma->vm_end, vma); > > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm); > > > >- gup_flags = FOLL_TOUCH | FOLL_POPULATE; > >+ gup_flags = FOLL_TOUCH | FOLL_MLOCK; > >+ if ((vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT)) == VM_LOCKED) > >+ gup_flags |= FOLL_POPULATE; > >+ > > /* > > * We want to touch writable mappings with a write fault in order > > * to break COW, except for shared mappings because these don't COW > > I think this might be breaking the populate part of mmap(MAP_POPULATE & > ~MAP_LOCKED) case, if I follow the execution correctly (it's far from > simple...) > > SYSCALL_DEFINE6(mmap_pgoff... with MAP_POPULATE > vm_mmap_pgoff(..., MAP_POPULATE...) > do_mmap_pgoff(...MAP_POPULATE... &populate) -> populate == TRUE > mm_populate() > __mm_populate() > populate_vma_page_range() > > Previously, this path would have FOLL_POPULATE in gup_flags and continue > with __get_user_pages() and faultin_page() (actually regardless of > FOLL_POPULATE) which would fault in the pages. > > After your patch, populate_vma_page_range() will set FOLL_MLOCK, but since > VM_LOCKED is not set, FOLL_POPULATE won't be set either. > Then faultin_page() will return on the new check: > > flags & (FOLL_POPULATE | FOLL_MLOCK)) == FOLL_MLOCK Good catch! I guess it should be something like: gup_flags = FOLL_TOUCH | FOLL_POPULATE | FOLL_MLOCK; if (vma->vm_flags & VM_LOCKONFAULT) gup_flags &= ~FOLL_POPULATE; -- Kirill A. Shutemov _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel