From: Jiri Kosina <jkosina@xxxxxxx> There are possibilities [1] how mincore() could be used as a converyor of a sidechannel information about pagecache metadata. Provide vm.mincore_privileged sysctl, which makes it possible to mincore() start returning -EPERM in case it's invoked by a process lacking CAP_SYS_ADMIN. The default behavior stays "mincore() can be used by anybody" in order to be conservative with respect to userspace behavior. [1] https://www.theregister.co.uk/2019/01/05/boffins_beat_page_cache/ Signed-off-by: Jiri Kosina <jkosina@xxxxxxx> --- Documentation/sysctl/vm.txt | 9 +++++++++ kernel/sysctl.c | 8 ++++++++ mm/mincore.c | 5 +++++ 3 files changed, 22 insertions(+) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 187ce4f599a2..afb8635e925e 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -41,6 +41,7 @@ Currently, these files are in /proc/sys/vm: - min_free_kbytes - min_slab_ratio - min_unmapped_ratio +- mincore_privileged - mmap_min_addr - mmap_rnd_bits - mmap_rnd_compat_bits @@ -485,6 +486,14 @@ files and similar are considered. The default is 1 percent. ============================================================== +mincore_privileged: + +mincore() could be potentially used to mount a side-channel attack against +pagecache metadata. This sysctl provides system administrators means to +make it available only to processess that own CAP_SYS_ADMIN capability. + +The default is 0, which means mincore() can be used without restrictions. +============================================================== mmap_min_addr diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 1825f712e73b..f03cb07c8dd4 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -114,6 +114,7 @@ extern unsigned int sysctl_nr_open_min, sysctl_nr_open_max; #ifndef CONFIG_MMU extern int sysctl_nr_trim_pages; #endif +extern int sysctl_mincore_privileged; /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR @@ -1684,6 +1685,13 @@ static struct ctl_table vm_table[] = { .extra2 = (void *)&mmap_rnd_compat_bits_max, }, #endif + { + .procname = "mincore_privileged", + .data = &sysctl_mincore_privileged, + .maxlen = sizeof(sysctl_mincore_privileged), + .mode = 0644, + .proc_handler = proc_dointvec, + }, { } }; diff --git a/mm/mincore.c b/mm/mincore.c index 218099b5ed31..77d4928cdfaa 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -21,6 +21,8 @@ #include <linux/uaccess.h> #include <asm/pgtable.h> +int sysctl_mincore_privileged; + static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -228,6 +230,9 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len, unsigned long pages; unsigned char *tmp; + if (sysctl_mincore_privileged && !capable(CAP_SYS_ADMIN)) + return -EPERM; + /* Check the start address: needs to be page-aligned.. */ if (start & ~PAGE_MASK) return -EINVAL; -- Jiri Kosina SUSE Labs