The patch titled Subject: mm/madvise: remove CAP_SYS_ADMIN requirement for process_madvise(MADV_COLLAPSE) has been added to the -mm mm-unstable branch. Its filename is mm-madvise-add-madv_collapse-to-process_madvise-fix.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-madvise-add-madv_collapse-to-process_madvise-fix.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: "Zach O'Keefe" <zokeefe@xxxxxxxxxx> Subject: mm/madvise: remove CAP_SYS_ADMIN requirement for process_madvise(MADV_COLLAPSE) Date: Mon, 1 Aug 2022 14:09:46 -0700 process_madvise(MADV_COLLAPSE) currently requires CAP_SYS_ADMIN when not acting on the caller's own mm. This is maximally restrictive, and perpetuates existing issues with CAP_SYS_ADMIN. Remove this requirement. When acting on an external process' memory, the biggest concerns for process_madvise(MADV_COLLAPSE) are (1) being able to influence process performance by moving memory, possibly between nodes, that is mapped into the address space of external process(es), (2) defeat of address-space-layout randomization, and (3), being able to increase process RSS and memcg usage, possibly causing memcg OOM. process_madvise(2) already enforces CAP_SYS_NICE and PTRACE_MODE_READ (in PTRACE_MODE_FSCREDS mode). A process with these credentials can already accomplish (1) and (2) via move_pages(MPOL_MF_MOVE_ALL), and (3) via process_madvise(MADV_WILLNEED). process_madvise(MADV_COLLAPSE) may also circumvent sysfs THP settings. When acting on one's own memory (which is equivalent to madvise(MADV_COLLAPSE)), this is deemed acceptable, since aside from the possibility of hoarding available hugepages (which is currently already possible) no harm to the system can be done. When acting on an external process' memory, circumventing sysfs THP settings should provide no additional threat compared to the ones listed. As such, imposing additional capabilities (such as CAP_SETUID, as a way to ensure the caller could have just altered the sysfs THP settings themselves) provides no extra protection. Link: https://lkml.kernel.org/r/20220801210946.3069083-1-zokeefe@xxxxxxxxxx Fixes: 7ec952341312 ("mm/madvise: add MADV_COLLAPSE to process_madvise()") Signed-off-by: Zach O'Keefe <zokeefe@xxxxxxxxxx> Cc: Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Arnd Bergmann <arnd@xxxxxxxx> Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx> Cc: Chris Kennelly <ckennelly@xxxxxxxxxx> Cc: Chris Zankel <chris@xxxxxxxxxx> Cc: Dan Carpenter <dan.carpenter@xxxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Helge Deller <deller@xxxxxx> Cc: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Ivan Kokshaysky <ink@xxxxxxxxxxxxxxxxxxxx> Cc: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> Cc: Jens Axboe <axboe@xxxxxxxxx> Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Matt Turner <mattst88@xxxxxxxxx> Cc: Max Filippov <jcmvbkbc@xxxxxxxxx> Cc: Miaohe Lin <linmiaohe@xxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> Cc: Pavel Begunkov <asml.silence@xxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: Rongwei Wang <rongwei.wang@xxxxxxxxxxxxxxxxx> Cc: SeongJae Park <sj@xxxxxxxxxx> Cc: Song Liu <songliubraving@xxxxxx> Cc: "Souptick Joarder (HPE)" <jrdr.linux@xxxxxxxxx> Cc: Thomas Bogendoerfer <tsbogend@xxxxxxxxxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Yang Shi <shy828301@xxxxxxxxx> Cc: Zi Yan <ziy@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/madvise.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) --- a/mm/madvise.c~mm-madvise-add-madv_collapse-to-process_madvise-fix +++ a/mm/madvise.c @@ -1170,16 +1170,14 @@ madvise_behavior_valid(int behavior) } } -static bool -process_madvise_behavior_valid(int behavior, struct task_struct *task) +static bool process_madvise_behavior_valid(int behavior) { switch (behavior) { case MADV_COLD: case MADV_PAGEOUT: case MADV_WILLNEED: - return true; case MADV_COLLAPSE: - return task == current || capable(CAP_SYS_ADMIN); + return true; default: return false; } @@ -1457,7 +1455,7 @@ SYSCALL_DEFINE5(process_madvise, int, pi goto free_iov; } - if (!process_madvise_behavior_valid(behavior, task)) { + if (!process_madvise_behavior_valid(behavior)) { ret = -EINVAL; goto release_task; } _ Patches currently in -mm which might be from zokeefe@xxxxxxxxxx are mm-khugepaged-add-struct-collapse_control.patch mm-khugepaged-add-struct-collapse_control-fix.patch mm-khugepaged-dedup-and-simplify-hugepage-alloc-and-charging.patch mm-khugepaged-pipe-enum-scan_result-codes-back-to-callers.patch mm-khugepaged-add-flag-to-predicate-khugepaged-only-behavior.patch mm-thp-add-flag-to-enforce-sysfs-thp-in-hugepage_vma_check.patch mm-khugepaged-add-flag-to-predicate-khugepaged-only-behavior-fix.patch mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage.patch mm-madvise-introduce-madv_collapse-sync-hugepage-collapse.patch mm-madvise-introduce-madv_collapse-sync-hugepage-collapse-fix-2.patch mm-madvise-introduce-madv_collapse-sync-hugepage-collapse-fix-3.patch mm-khugepaged-rename-prefix-of-shared-collapse-functions.patch mm-madvise-add-madv_collapse-to-process_madvise.patch mm-madvise-add-madv_collapse-to-process_madvise-fix.patch selftests-vm-modularize-collapse-selftests.patch selftests-vm-dedup-hugepage-allocation-logic.patch selftests-vm-add-madv_collapse-collapse-context-to-selftests.patch selftests-vm-add-selftest-to-verify-recollapse-of-thps.patch selftests-vm-add-selftest-to-verify-multi-thp-collapse.patch