On Fri, May 24, 2024 at 10:28:58PM +0300, Adrian Ratiu wrote: > Prior to v2.6.39 write access to /proc/<pid>/mem was restricted, > after which it got allowed in commit 198214a7ee50 ("proc: enable > writing to /proc/pid/mem"). Famous last words from that patch: > "no longer a security hazard". :) > > Afterwards exploits started causing drama like [1]. The exploits > using /proc/*/mem can be rather sophisticated like [2] which > installed an arbitrary payload from noexec storage into a running > process then exec'd it, which itself could include an ELF loader > to run arbitrary code off noexec storage. > > One of the well-known problems with /proc/*/mem writes is they > ignore page permissions via FOLL_FORCE, as opposed to writes via > process_vm_writev which respect page permissions. These writes can > also be used to bypass mode bits. > > To harden against these types of attacks, distrbutions might want > to restrict /proc/pid/mem accesses, either entirely or partially, > for eg. to restrict FOLL_FORCE usage. > > Known valid use-cases which still need these accesses are: > > * Debuggers which also have ptrace permissions, so they can access > memory anyway via PTRACE_POKEDATA & co. Some debuggers like GDB > are designed to write /proc/pid/mem for basic functionality. > > * Container supervisors using the seccomp notifier to intercept > syscalls and rewrite memory of calling processes by passing > around /proc/pid/mem file descriptors. > > There might be more, that's why these params default to disabled. > > Regarding other mechanisms which can block these accesses: > > * seccomp filters can be used to block mmap/mprotect calls with W|X > perms, but they often can't block open calls as daemons want to > read/write their runtime state and seccomp filters cannot check > file paths, so plain write calls can't be easily blocked. > > * Since the mem file is part of the dynamic /proc/<pid>/ space, we > can't run chmod once at boot to restrict it (and trying to react > to every process and run chmod doesn't scale, and the kernel no > longer allows chmod on any of these paths). > > * SELinux could be used with a rule to cover all /proc/*/mem files, > but even then having multiple ways to deny an attack is useful in > case one layer fails. > > Thus we introduce four kernel parameters to restrict /proc/*/mem > access: open-read, open-write, write and foll_force. All these can > be independently set to the following values: > > all => restrict all access unconditionally. > ptracer => restrict all access except for ptracer processes. > > If left unset, the existing behaviour is preserved, i.e. access > is governed by basic file permissions. > > Examples which can be passed by bootloaders: > > proc_mem.restrict_foll_force=all > proc_mem.restrict_open_write=ptracer > proc_mem.restrict_open_read=ptracer > proc_mem.restrict_write=all > > These knobs can also be enabled via Kconfig like for eg: > > CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT=y > CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT=y > > Each distribution needs to decide what restrictions to apply, > depending on its use-cases. Embedded systems might want to do > more, while general-purpouse distros might want a more relaxed > policy, because for e.g. foll_force=all and write=all both break > break GDB, so it might be a bit excessive. > > Based on an initial patch by Mike Frysinger <vapier@xxxxxxxxxxxx>. > > Link: https://lwn.net/Articles/476947/ [1] > Link: https://issues.chromium.org/issues/40089045 [2] > Cc: Guenter Roeck <groeck@xxxxxxxxxxxx> > Cc: Doug Anderson <dianders@xxxxxxxxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > Cc: Jann Horn <jannh@xxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> > Cc: Christian Brauner <brauner@xxxxxxxxxx> > Co-developed-by: Mike Frysinger <vapier@xxxxxxxxxxxx> > Signed-off-by: Mike Frysinger <vapier@xxxxxxxxxxxx> > Signed-off-by: Adrian Ratiu <adrian.ratiu@xxxxxxxxxxxxx> > --- > Changes in v4: > * Renamed parameters to use a fake namespace and respect > subject-verb-objec pattern (eg proc_mem.restrict_read) > * Replaced static key array with individual definitions. > Still need 6 key definitions because we need to store 3 > states for each parameter, eg read all/ptrace/DAC states, > so we need 2 keys for each parameter -- they will not fit > into just 1 static key. > * Replaced strncmp -> strcmp and dropped redundant helper, > significantly simplified DEFINE_EARLY_PROC_MEM_RESTRICT > macro. > * Dropped else from __mem_open_check_access_restriction() > * Moved ptracer check to proc_mem_open to avoid ToCToU > * Added extra mm_access() check for the mem_rw() case > * Found a use case for blocking just writes independent > of open restrictions, so added a new param > * Added *_DEFAULT Kconfigs > --- > .../admin-guide/kernel-parameters.txt | 38 ++++++ > fs/proc/base.c | 124 +++++++++++++++++- > security/Kconfig | 68 ++++++++++ > 3 files changed, 229 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 500cfa776225..3fdfeaefccf2 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -4792,6 +4792,44 @@ > printk.time= Show timing data prefixed to each printk message line > Format: <bool> (1/Y/y=enable, 0/N/n=disable) > > + proc_mem.restrict_foll_force= [KNL] > + Format: {all | ptracer} > + Restricts the use of the FOLL_FORCE flag for /proc/*/mem access. > + If restricted, the FOLL_FORCE flag will not be added to vm accesses. > + Can be one of: > + - 'all' restricts all access unconditionally. > + - 'ptracer' allows access only for ptracer processes. > + If not specified, FOLL_FORCE is always used. > + > + proc_mem.restrict_open_read= [KNL] > + Format: {all | ptracer} > + Allows restricting read access to /proc/*/mem files during open(). > + Depending on restriction level, open for reads return -EACCES. > + Can be one of: > + - 'all' restricts all access unconditionally. > + - 'ptracer' allows access only for ptracer processes. > + If not specified, then basic file permissions continue to apply. > + > + proc_mem.restrict_open_write= [KNL] > + Format: {all | ptracer} > + Allows restricting write access to /proc/*/mem files during open(). > + Depending on restriction level, open for writes return -EACCES. > + Can be one of: > + - 'all' restricts all access unconditionally. > + - 'ptracer' allows access only for ptracer processes. > + If not specified, then basic file permissions continue to apply. > + > + proc_mem.restrict_write= [KNL] > + Format: {all | ptracer} > + Allows restricting write access to /proc/*/mem after the files > + have been opened, during the actual write calls. This is useful for > + systems which can't block writes earlier during open(). > + Depending on restriction level, writes will return -EACCES. > + Can be one of: > + - 'all' restricts all access unconditionally. > + - 'ptracer' allows access only for ptracer processes. > + If not specified, then basic file permissions continue to apply. > + > processor.max_cstate= [HW,ACPI] > Limit processor to maximum C-state > max_cstate=9 overrides any DMI blacklist limit. > diff --git a/fs/proc/base.c b/fs/proc/base.c > index 6faf1b3a4117..9223eaaf055b 100644 > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -152,6 +152,30 @@ struct pid_entry { > NULL, &proc_pid_attr_operations, \ > { .lsmid = LSMID }) > > +#define DEFINE_EARLY_PROC_MEM_RESTRICT(CFG, name) \ > +DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_PROC_MEM_RESTRICT_##CFG##_DEFAULT, \ > + proc_mem_restrict_##name##_all); \ > +DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_PROC_MEM_RESTRICT_##CFG##_PTRACE_DEFAULT, \ > + proc_mem_restrict_##name##_ptracer); \ > + \ > +static int __init early_proc_mem_restrict_##name(char *buf) \ > +{ \ > + if (!buf) \ > + return -EINVAL; \ > + \ > + if (strcmp(buf, "all") == 0) \ > + static_key_slow_inc(&proc_mem_restrict_##name##_all.key); \ > + else if (strcmp(buf, "ptracer") == 0) \ > + static_key_slow_inc(&proc_mem_restrict_##name##_ptracer.key); \ > + return 0; \ > +} \ > +early_param("proc_mem.restrict_" #name, early_proc_mem_restrict_##name) > + > +DEFINE_EARLY_PROC_MEM_RESTRICT(OPEN_READ, open_read); > +DEFINE_EARLY_PROC_MEM_RESTRICT(OPEN_WRITE, open_write); > +DEFINE_EARLY_PROC_MEM_RESTRICT(WRITE, write); > +DEFINE_EARLY_PROC_MEM_RESTRICT(FOLL_FORCE, foll_force); > + > /* > * Count the number of hardlinks for the pid_entry table, excluding the . > * and .. links. > @@ -794,12 +818,56 @@ static const struct file_operations proc_single_file_operations = { > }; > > > +static int __mem_open_access_permitted(struct file *file, struct task_struct *task) > +{ > + bool is_ptracer; > + > + rcu_read_lock(); > + is_ptracer = current == ptrace_parent(task); > + rcu_read_unlock(); > + > + if (file->f_mode & FMODE_WRITE) { > + /* Deny if writes are unconditionally disabled via param */ > + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_DEFAULT, > + &proc_mem_restrict_open_write_all)) > + return -EACCES; > + > + /* Deny if writes are allowed only for ptracers via param */ > + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_PTRACE_DEFAULT, > + &proc_mem_restrict_open_write_ptracer) && > + !is_ptracer) > + return -EACCES; > + } > + > + if (file->f_mode & FMODE_READ) { > + /* Deny if reads are unconditionally disabled via param */ > + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_READ_DEFAULT, > + &proc_mem_restrict_open_read_all)) > + return -EACCES; > + > + /* Deny if reads are allowed only for ptracers via param */ > + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_READ_PTRACE_DEFAULT, > + &proc_mem_restrict_open_read_ptracer) && > + !is_ptracer) > + return -EACCES; > + } > + > + return 0; /* R/W are not restricted */ > +} > + > struct mm_struct *proc_mem_open(struct file *file, unsigned int mode) > { > struct task_struct *task = get_proc_task(file->f_inode); > struct mm_struct *mm = ERR_PTR(-ESRCH); > + int ret; > > if (task) { > + ret = __mem_open_access_permitted(file, task); > + if (ret) { > + put_task_struct(task); > + return ERR_PTR(ret); > + } > + > mm = mm_access(task, mode | PTRACE_MODE_FSCREDS); > put_task_struct(task); > > @@ -835,6 +903,56 @@ static int mem_open(struct inode *inode, struct file *file) > return ret; > } > > +static bool __mem_rw_current_is_ptracer(struct file *file) > +{ > + struct inode *inode = file_inode(file); > + struct task_struct *task = get_proc_task(inode); > + int is_ptracer = false, has_mm_access = false; > + > + if (task) { > + rcu_read_lock(); > + is_ptracer = current == ptrace_parent(task); > + rcu_read_unlock(); > + > + has_mm_access = file->private_data == mm_access(task, PTRACE_MODE_READ_FSCREDS); > + put_task_struct(task); > + } > + > + return is_ptracer && has_mm_access; > +} This is much improved; thanks! One resource leak is here, though: mm_access() takes a reference count on the mm, so you'll need something like: ... if (task) { struct mm_struct *mm; rcu_read_lock(); is_ptracer = current == ptrace_parent(task); rcu_read_unlock(); mm = mm_access(task, PTRACE_MODE_READ_FSCREDS); if (mm && file->private_data == mm) { has_mm_access = true; mmput(mm); } put_task_struct(task); } ... > + > +static unsigned int __mem_rw_get_foll_force_flag(struct file *file) > +{ > + /* Deny if FOLL_FORCE is disabled via param */ > + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_DEFAULT, > + &proc_mem_restrict_foll_force_all)) > + return 0; > + > + /* Deny if FOLL_FORCE is allowed only for ptracers via param */ > + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT, > + &proc_mem_restrict_foll_force_ptracer) && > + !__mem_rw_current_is_ptracer(file)) > + return 0; > + > + return FOLL_FORCE; > +} > + > +static bool __mem_rw_block_writes(struct file *file) > +{ > + /* Block if writes are disabled via param proc_mem.restrict_write=all */ > + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_WRITE_DEFAULT, > + &proc_mem_restrict_write_all)) > + return true; > + > + /* Block with an exception only for ptracers */ > + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT, > + &proc_mem_restrict_write_ptracer) && > + !__mem_rw_current_is_ptracer(file)) > + return true; > + > + return false; > +} > + > static ssize_t mem_rw(struct file *file, char __user *buf, > size_t count, loff_t *ppos, int write) > { > @@ -847,6 +965,9 @@ static ssize_t mem_rw(struct file *file, char __user *buf, > if (!mm) > return 0; > > + if (write && __mem_rw_block_writes(file)) > + return -EACCES; > + > page = (char *)__get_free_page(GFP_KERNEL); > if (!page) > return -ENOMEM; > @@ -855,7 +976,8 @@ static ssize_t mem_rw(struct file *file, char __user *buf, > if (!mmget_not_zero(mm)) > goto free; > > - flags = FOLL_FORCE | (write ? FOLL_WRITE : 0); > + flags = (write ? FOLL_WRITE : 0); > + flags |= __mem_rw_get_foll_force_flag(file); > > while (count > 0) { > size_t this_len = min_t(size_t, count, PAGE_SIZE); > diff --git a/security/Kconfig b/security/Kconfig > index 412e76f1575d..0cd73f848b5a 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -183,6 +183,74 @@ config STATIC_USERMODEHELPER_PATH > If you wish for all usermode helper programs to be disabled, > specify an empty string here (i.e. ""). > > +menu "Procfs mem restriction options" > + > +config PROC_MEM_RESTRICT_FOLL_FORCE_DEFAULT > + bool "Restrict all FOLL_FORCE flag usage" > + default n > + help > + Restrict all FOLL_FORCE usage during /proc/*/mem RW. > + Debuggerg like GDB require using FOLL_FORCE for basic > + functionality. > + > +config PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT > + bool "Restrict FOLL_FORCE usage except for ptracers" > + default n > + help > + Restrict FOLL_FORCE usage during /proc/*/mem RW, except > + for ptracer processes. Debuggerg like GDB require using > + FOLL_FORCE for basic functionality. > + > +config PROC_MEM_RESTRICT_OPEN_READ_DEFAULT > + bool "Restrict all open() read access" > + default n > + help > + Restrict all open() read access to /proc/*/mem files. > + Use with caution: this can break init systems, debuggers, > + container supervisors and other tasks using /proc/*/mem. > + > +config PROC_MEM_RESTRICT_OPEN_READ_PTRACE_DEFAULT > + bool "Restrict open() for reads except for ptracers" > + default n > + help > + Restrict open() read access except for ptracer processes. > + Use with caution: this can break init systems, debuggers, > + container supervisors and other non-ptrace capable tasks > + using /proc/*/mem. > + > +config PROC_MEM_RESTRICT_OPEN_WRITE_DEFAULT > + bool "Restrict all open() write access" > + default n > + help > + Restrict all open() write access to /proc/*/mem files. > + Debuggers like GDB and some container supervisors tasks > + require opening as RW and may break. > + > +config PROC_MEM_RESTRICT_OPEN_WRITE_PTRACE_DEFAULT > + bool "Restrict open() for writes except for ptracers" > + default n > + help > + Restrict open() write access except for ptracer processes, > + usually debuggers. > + > +config PROC_MEM_RESTRICT_WRITE_DEFAULT > + bool "Restrict all write() calls" > + default n > + help > + Restrict all /proc/*/mem direct write calls. > + Open calls with RW modes are still allowed, this blocks > + just the write() calls. > + > +config PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT > + bool "Restrict write() calls except for ptracers" > + default n > + help > + Restrict /proc/*/mem direct write calls except for ptracer processes. > + Open calls with RW modes are still allowed, this blocks just > + the write() calls. > + > +endmenu > + > source "security/selinux/Kconfig" > source "security/smack/Kconfig" > source "security/tomoyo/Kconfig" > -- > 2.44.1 I think this looks really close. -- Kees Cook