On 01/15/2016 07:11 PM, Dave Hansen wrote:
Just sending an update to this one patch instead of resending the entire series. Jan Kara suggested that we just change get_user_pages()'s prototype to remove tsk/mm instead of introducing a separate get_user_pages_current(). Also, we moved the "_foreign" in get_user_pages_foreign() to the end to be more consistent with the "_unlocked" version.
Thanks, and you could have blamed me too, not just Jan ;)
This approach will break any new users of get_user_pages() which try to pass a tsk/mm, but Jan doesn't think these are frequent enough to be a concern. This passes an allyesconfig on 4.4, at least. As always, any acks on this approach would be much appreciated. This is the largest swath of non-x86 code that protection keys touches, and I'm sure the x86 maintainers would appreciate seeing some acks from folks on it.
This is finally a thorough review attempt, sorry I didn't catch some of the stuff below earlier.
--- From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> For protection keys, we need to understand whether protections should be enforced in software or not. In general, we enforce protections when working on our own task, but not when on others. We call these "current" and "foreign" operations. This patch introduces a new get_user_pages() variant: get_user_pages_foreign() The plain get_user_pages() can no longer be used on mm/tasks other than 'current/current->mm', which is by far the most common way it is called. Using it makes a few of the call sites look a bit nicer. get_user_pages_foreign() is a replacement for when get_user_pages() is called on non-current tsk/mm. This also switches get_user_pages_unlocked() over to be like get_user_pages() and not take a tsk/mm. If someone wants the get_user_pages_unlocked() behavior with a non-current tsk/mm, they just have to use __get_user_pages_unlocked() directly.
Hmm, but your patch actually changes __get_user_pages_unlocked() to also not include the task and mm params and assume current and current->mm? Wouldn't it be more consistent if __get_user_unlocked() stayed as it is?
What you say above is true for {__}get_user_pages_locked.
Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Cc: vbabka@xxxxxxx Cc: jack@xxxxxxx --- b/arch/cris/arch-v32/drivers/cryptocop.c | 8 --- b/arch/ia64/kernel/err_inject.c | 3 - b/arch/mips/mm/gup.c | 3 - b/arch/s390/mm/gup.c | 4 - b/arch/sh/mm/gup.c | 2 b/arch/sparc/mm/gup.c | 2 b/arch/x86/mm/gup.c | 2 b/arch/x86/mm/mpx.c | 4 - b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 - b/drivers/gpu/drm/i915/i915_gem_userptr.c | 2 b/drivers/gpu/drm/radeon/radeon_ttm.c | 3 - b/drivers/gpu/drm/via/via_dmablit.c | 3 - b/drivers/infiniband/core/umem.c | 2 b/drivers/infiniband/core/umem_odp.c | 8 +-- b/drivers/infiniband/hw/mthca/mthca_memfree.c | 3 - b/drivers/infiniband/hw/qib/qib_user_pages.c | 3 - b/drivers/infiniband/hw/usnic/usnic_uiom.c | 2 b/drivers/media/pci/ivtv/ivtv-udma.c | 4 - b/drivers/media/pci/ivtv/ivtv-yuv.c | 10 +--- b/drivers/media/v4l2-core/videobuf-dma-sg.c | 3 - b/drivers/misc/mic/scif/scif_rma.c | 2 b/drivers/misc/sgi-gru/grufault.c | 3 - b/drivers/scsi/st.c | 2 b/drivers/staging/rdma/hfi1/user_pages.c | 3 - b/drivers/staging/rdma/ipath/ipath_user_pages.c | 3 - b/drivers/video/fbdev/pvr2fb.c | 4 - b/drivers/virt/fsl_hypervisor.c | 5 -- b/fs/exec.c | 8 ++- b/include/linux/mm.h | 23 +++++----- b/kernel/events/uprobes.c | 4 - b/mm/frame_vector.c | 2 b/mm/gup.c | 51 +++++++++++++++--------- b/mm/ksm.c | 2 b/mm/memory.c | 2 b/mm/mempolicy.c | 6 +- b/mm/nommu.c | 35 +++++++++------- b/mm/process_vm_access.c | 6 +- b/mm/util.c | 4 - b/net/ceph/pagevec.c | 2 b/security/tomoyo/domain.c | 9 +++- b/virt/kvm/async_pf.c | 2 b/virt/kvm/kvm_main.c | 13 ++---- 42 files changed, 135 insertions(+), 130 deletions(-)
[...]
--- a/kernel/events/uprobes.c~get_current_user_pages 2016-01-15 09:45:42.110046066 -0800 +++ b/kernel/events/uprobes.c 2016-01-15 09:45:42.152047953 -0800 @@ -298,7 +298,7 @@ int uprobe_write_opcode(struct mm_struct retry: /* Read the page with vaddr into memory */ - ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &old_page, &vma); + ret = get_user_pages_foreign(NULL, mm, vaddr, 1, 0, 1, &old_page, &vma); if (ret <= 0) return ret; @@ -1699,7 +1699,7 @@ static int is_trap_at_addr(struct mm_str if (likely(result == 0)) goto out; - result = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL); + result = get_user_pages(vaddr, 1, 0, 1, &page, NULL);
Yeah it seems that mm here is current->mm, and using current task instead of NULL affects AFAICS just the min/maj fault counting, but isn't it still a subtle and unintended functional change?
[...]
-__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm, - unsigned long start, unsigned long nr_pages, +__always_inline long __get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, int write, int force, struct page **pages, unsigned int gup_flags)
This is the IMHO unneeded inconsistency what I mentioned above...
diff -puN mm/process_vm_access.c~get_current_user_pages mm/process_vm_access.c --- a/mm/process_vm_access.c~get_current_user_pages 2016-01-15 09:45:42.120046515 -0800 +++ b/mm/process_vm_access.c 2016-01-15 09:45:42.157048177 -0800 @@ -99,8 +99,10 @@ static int process_vm_rw_single_vec(unsi size_t bytes; /* Get the pages we're interested in */ - pages = get_user_pages_unlocked(task, mm, pa, pages, - vm_write, 0, process_pages); + down_read(&mm->mmap_sem); + pages = get_user_pages_foreign(task, mm, pa, pages, vm_write, + 0, process_pages, NULL); + up_read(&mm->mmap_sem);
You could have simply used __get_user_pages_unlocked() if it wasn't changed.
@@ -80,7 +80,7 @@ static void async_pf_execute(struct work might_sleep(); - get_user_pages_unlocked(NULL, mm, addr, 1, 1, 0, NULL); + get_user_pages_unlocked(addr, 1, 1, 0, NULL);
This seems to get mm from some structure where struct work is embedded, are you sure it's current->mm? I think it's another place for __get_user_pages_unlocked().
static inline int check_user_page_hwpoison(unsigned long addr) @@ -1344,12 +1345,10 @@ static int hva_to_pfn_slow(unsigned long if (async) { down_read(¤t->mm->mmap_sem); - npages = get_user_page_nowait(current, current->mm, - addr, write_fault, page); + npages = get_user_page_nowait(addr, write_fault, page); up_read(¤t->mm->mmap_sem); } else - npages = __get_user_pages_unlocked(current, current->mm, addr, 1, - write_fault, 0, page, + npages = __get_user_pages_unlocked(addr, 1, write_fault, 0, page, FOLL_TOUCH|FOLL_HWPOISON); if (npages != 1) return npages;
If you change __get_user_pages_unlocked() as I suggested, you could use get_user_pages_unlocked() here?
-- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>