Hello On Monday, February 26, 2024 21:24 EET, Kees Cook <keescook@xxxxxxxxxxxx> wrote: > [sorry for the duplicate, fixing Jann's email address] > > On Mon, Feb 26, 2024 at 09:10:54AM -0800, Doug Anderson wrote: > > Hi, > > > > On Wed, Feb 21, 2024 at 1:06 PM Adrian Ratiu <adrian.ratiu@xxxxxxxxxxxxx> wrote: > > > > > > Prior to v2.6.39 write access to /proc/<pid>/mem was restricted, > > > after which it got allowed in commit 198214a7ee50 ("proc: enable > > > writing to /proc/pid/mem"). Famous last words from that patch: > > > "no longer a security hazard". :) > > > > > > Afterwards exploits appeared started causing drama like [1]. The > > > /proc/*/mem exploits can be rather sophisticated like [2] which > > > installed an arbitrary payload from noexec storage into a running > > > process then exec'd it, which itself could include an ELF loader > > > to run arbitrary code off noexec storage. > > > > > > As part of hardening against these types of attacks, distrbutions > > > can restrict /proc/*/mem to only allow writes when they makes sense, > > > like in case of debuggers which have ptrace permissions, as they > > > are able to access memory anyway via PTRACE_POKEDATA and friends. > > > > > > Dropping the mode bits disables write access for non-root users. > > > Trying to `chmod` the paths back fails as the kernel rejects it. > > > > > > For users with CAP_DAC_OVERRIDE (usually just root) we have to > > > disable the mem_write callback to avoid bypassing the mode bits. > > > > > > Writes can be used to bypass permissions on memory maps, even if a > > > memory region is mapped r-x (as is a program's executable pages), > > > the process can open its own /proc/self/mem file and write to the > > > pages directly. > > > > > > Even if seccomp filters block mmap/mprotect calls with W|X perms, > > > they often cannot block open calls as daemons want to read/write > > > their own runtime state and seccomp filters cannot check file paths. > > > Write calls also can't be blocked in general via seccomp. > > > > > > Since the mem file is part of the dynamic /proc/<pid>/ space, we > > > can't run chmod once at boot to restrict it (and trying to react > > > to every process and run chmod doesn't scale, and the kernel no > > > longer allows chmod on any of these paths). > > > > > > SELinux could be used with a rule to cover all /proc/*/mem files, > > > but even then having multiple ways to deny an attack is useful in > > > case on layer fails. > > > > > > [1] https://lwn.net/Articles/476947/ > > > [2] https://issues.chromium.org/issues/40089045 > > > > > > Based on an initial patch by Mike Frysinger <vapier@xxxxxxxxxxxx>. > > > > > > Cc: Guenter Roeck <groeck@xxxxxxxxxxxx> > > > Cc: Doug Anderson <dianders@xxxxxxxxxxxx> > > > Signed-off-by: Mike Frysinger <vapier@xxxxxxxxxxxx> > > This should have a "Co-developed-by: Mike..." tag, since you're making > changes and not just passing it along directly. Thanks, I'll address this in v2. > > > > Signed-off-by: Adrian Ratiu <adrian.ratiu@xxxxxxxxxxxxx> > > > --- > > > Tested on next-20240220. > > > > > > I would really like to avoid depending on CONFIG_MEMCG which is > > > required for the struct mm_stryct "owner" pointer. > > > > > > Any suggestions how check the ptrace owner without MEMCG? > > > --- > > > fs/proc/base.c | 26 ++++++++++++++++++++++++-- > > > security/Kconfig | 13 +++++++++++++ > > > 2 files changed, 37 insertions(+), 2 deletions(-) > > > > Thanks for posting this! This looks reasonable to me, but I'm nowhere > > near an expert on this so I won't add a Reviewed-by tag. > > > > This feels like the kind of thing that Kees might be interested in > > reviewing, so adding him to the "To" list. > > I'd love to make /proc/$pid/mem more strict. A few comments: > > > [...] > > + if (ptracer_capable(current, mm->user_ns) && > > It really looks like you're trying to do a form of ptrace_may_access(), > but _without_ the introspection exception? > > Also, using "current" in the write path can lead to problems[1], so this > should somehow use file->f_cred, or limit write access during the open() > instead? I think Mike explained pretty well why we need to check if current already is a ptracer. The point you raise is valid (thanks again) so we need to check a bit earlier, during open(). > > > [...] > > +config SECURITY_PROC_MEM_RESTRICT_WRITES > > Instead of a build-time CONFIG, I'd prefer a boot-time config (or a > sysctl, but that's be harder given the perms). That this is selectable > by distro users, etc, and they don't need to rebuild their kernel to > benefit from it. Ack, I'll implement a cmdline arg in v2. > > Jann Horn has tried to restrict access to this file in the past as well, > so he may have some additional advice about it. I'll leave this a few more days in case others have more ideas, then will send v2 and also add Jann to the "To:" list. > > -Kees > > [1] https://docs.kernel.org/security/credentials.html#open-file-credentials > > -- > Kees Cook