Re: [PATCH v7 5/7] fs,doc: Enable to enforce noexec mounts or file exec through O_MAYEXEC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/08/2020 21:58, Eric W. Biederman wrote:
> Mickaël Salaün <mic@xxxxxxxxxxx> writes:
> 
>> Allow for the enforcement of the O_MAYEXEC openat2(2) flag.  Thanks to
>> the noexec option from the underlying VFS mount, or to the file execute
>> permission, userspace can enforce these execution policies.  This may
>> allow script interpreters to check execution permission before reading
>> commands from a file, or dynamic linkers to allow shared object
>> loading.
> 
> Ick!!!!!
> 
> This feels like being so open minded your brains fall out.

Really? :)

> 
> I can see having a sysctl that allows the new open flag to be ignored
> so that the existing lack of enforcement when the flag is passed
> continues.

And that could break distros (i.e. multiple interpreters, with or
without O_MAYEXEC, different versions, and different kernels).

> 
> But having the sysctl be fine grained seems like way too much rope.

This follow the same rational: file permissions and mount options can
change but they are controled by the sysadmin, how also configure sysctl
values.

> 
> I don't think the code needs to do more than enforce or not enforce this
> logic.

I think this is the more viable behavior for an eclectic set of distros
(and different use cases).

> 
> You can test the sysctl once when you process O_MAYEXEC.  But code such
> as may_open should not have the conditional behavior.  It should get an
> appropriate set of flags that are always enforced.  With the madness of
> what to do left at the edge of userspace.

The problem is that this userspace is not in charge of the system-wide
policy, the sysadmin is. As pointed out by the commit message, please
take a look at the rational:
https://lore.kernel.org/lkml/1477d3d7-4b36-afad-7077-a38f42322238@xxxxxxxxxxx/

> 
> Anything else appears to be madness, overengineering, and a failure to
> separate concerns.

Oh! What a conclusion! :)

I'd say it is a pragmatic approach.

> 
> Eric
> 
> 
>> Add a new sysctl fs.open_mayexec_enforce to enable system administrators
>> to enforce two complementary security policies according to the
>> installed system: enforce the noexec mount option, and enforce
>> executable file permission.  Indeed, because of compatibility with
>> installed systems, only system administrators are able to check that
>> this new enforcement is in line with the system mount points and file
>> permissions.  A following patch adds documentation.
>>
>> Being able to restrict execution also enables to protect the kernel by
>> restricting arbitrary syscalls that an attacker could perform with a
>> crafted binary or certain script languages.  It also improves multilevel
>> isolation by reducing the ability of an attacker to use side channels
>> with specific code.  These restrictions can natively be enforced for ELF
>> binaries (with the noexec mount option) but require this kernel
>> extension to properly handle scripts (e.g., Python, Perl).  To get a
>> consistent execution policy, additional memory restrictions should also
>> be enforced (e.g. thanks to SELinux).
>>
>> Because the O_MAYEXEC flag is a meant to enforce a system-wide security
>> policy (but not application-centric policies), it does not make sense
>> for userland to check the sysctl value.  Indeed, this new flag only
>> enables to extend the system ability to enforce a policy thanks to (some
>> trusted) userland collaboration.  Moreover, additional security policies
>> could be managed by LSMs.  This is a best-effort approach from the
>> application developer point of view:
>> https://lore.kernel.org/lkml/1477d3d7-4b36-afad-7077-a38f42322238@xxxxxxxxxxx/
>>
>> Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxx>
>> Reviewed-by: Thibaut Sautereau <thibaut.sautereau@xxxxxxxxxxx>
>> Cc: Aleksa Sarai <cyphar@xxxxxxxxxx>
>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>> Cc: Jonathan Corbet <corbet@xxxxxxx>
>> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
>> Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
>> ---
>>
>> Changes since v6:
>> * Allow opening pipes, block devices and character devices with
>>   O_MAYEXEC when there is no enforced policy, but forbid any non-regular
>>   file opened with O_MAYEXEC otherwise (i.e. for any enforced policy).
>> * Add a paragraph about the non-regular files policy.
>> * Move path_noexec() calls out of the fast-path (suggested by Kees
>>   Cook).
>>
>> Changes since v5:
>> * Remove the static enforcement configuration through Kconfig because it
>>   makes the code more simple like this, and because the current sysctl
>>   configuration can only be set with CAP_SYS_ADMIN, the same way mount
>>   options (i.e. noexec) can be set.  If an harden distro wants to
>>   enforce a configuration, it should restrict capabilities or sysctl
>>   configuration.  Furthermore, an LSM can easily leverage O_MAYEXEC to
>>   fit its need.
>> * Move checks from inode_permission() to may_open() and make the error
>>   codes more consistent according to file types (in line with a previous
>>   commit): opening a directory with O_MAYEXEC returns EISDIR and other
>>   non-regular file types may return EACCES.
>> * In may_open(), when OMAYEXEC_ENFORCE_FILE is set, replace explicit
>>   call to generic_permission() with an artificial MAY_EXEC to avoid
>>   double calls.  This makes sense especially when an LSM policy forbids
>>   execution of a file.
>> * Replace the custom proc_omayexec() with
>>   proc_dointvec_minmax_sysadmin(), and then replace the CAP_MAC_ADMIN
>>   check with a CAP_SYS_ADMIN one (suggested by Kees Cook and Stephen
>>   Smalley).
>> * Use BIT() (suggested by Kees Cook).
>> * Rename variables (suggested by Kees Cook).
>> * Reword the kconfig help.
>> * Import the documentation patch (suggested by Kees Cook):
>>   https://lore.kernel.org/lkml/20200505153156.925111-6-mic@xxxxxxxxxxx/
>> * Update documentation and add LWN.net article.
>>
>> Changes since v4:
>> * Add kernel configuration options to enforce O_MAYEXEC at build time,
>>   and disable the sysctl in such case (requested by James Morris).
>> * Reword commit message.
>>
>> Changes since v3:
>> * Update comment with O_MAYEXEC.
>>
>> Changes since v2:
>> * Cosmetic changes.
>>
>> Changes since v1:
>> * Move code from Yama to the FS subsystem (suggested by Kees Cook).
>> * Make omayexec_inode_permission() static (suggested by Jann Horn).
>> * Use mode 0600 for the sysctl.
>> * Only match regular files (not directories nor other types), which
>>   follows the same semantic as commit 73601ea5b7b1 ("fs/open.c: allow
>>   opening only regular files during execve()").
>> ---
>>  Documentation/admin-guide/sysctl/fs.rst | 49 +++++++++++++++++++++++++
>>  fs/namei.c                              | 24 ++++++++++++
>>  include/linux/fs.h                      |  1 +
>>  kernel/sysctl.c                         | 12 +++++-
>>  4 files changed, 84 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
>> index 2a45119e3331..ce6e2081d3a9 100644
>> --- a/Documentation/admin-guide/sysctl/fs.rst
>> +++ b/Documentation/admin-guide/sysctl/fs.rst
>> @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs:
>>  - inode-nr
>>  - inode-state
>>  - nr_open
>> +- open_mayexec_enforce
>>  - overflowuid
>>  - overflowgid
>>  - pipe-user-pages-hard
>> @@ -165,6 +166,54 @@ system needs to prune the inode list instead of allocating
>>  more.
>>  
>>  
>> +open_mayexec_enforce
>> +--------------------
>> +
>> +While being ignored by :manpage:`open(2)` and :manpage:`openat(2)`, the
>> +``O_MAYEXEC`` flag can be passed to :manpage:`openat2(2)` to only open regular
>> +files that are expected to be executable.  If the file is not identified as
>> +executable, then the syscall returns -EACCES.  This may allow a script
>> +interpreter to check executable permission before reading commands from a file,
>> +or a dynamic linker to only load executable shared objects.  One interesting
>> +use case is to enforce a "write xor execute" policy through interpreters.
>> +
>> +The ability to restrict code execution must be thought as a system-wide policy,
>> +which first starts by restricting mount points with the ``noexec`` option.
>> +This option is also automatically applied to special filesystems such as /proc .
>> +This prevents files on such mount points to be directly executed by the kernel
>> +or mapped as executable memory (e.g. libraries).  With script interpreters
>> +using the ``O_MAYEXEC`` flag, the executable permission can then be checked
>> +before reading commands from files. This makes it possible to enforce the
>> +``noexec`` at the interpreter level, and thus propagates this security policy
>> +to scripts.  To be fully effective, these interpreters also need to handle the
>> +other ways to execute code: command line parameters (e.g., option ``-e`` for
>> +Perl), module loading (e.g., option ``-m`` for Python), stdin, file sourcing,
>> +environment variables, configuration files, etc.  According to the threat
>> +model, it may be acceptable to allow some script interpreters (e.g. Bash) to
>> +interpret commands from stdin, may it be a TTY or a pipe, because it may not be
>> +enough to (directly) perform syscalls.
>> +
>> +There are two complementary security policies: enforce the ``noexec`` mount
>> +option, and enforce executable file permission.  These policies are handled by
>> +the ``fs.open_mayexec_enforce`` sysctl (writable only with ``CAP_SYS_ADMIN``)
>> +as a bitmask:
>> +
>> +1 - Mount restriction: checks that the mount options for the underlying VFS
>> +    mount do not prevent execution.
>> +
>> +2 - File permission restriction: checks that the to-be-opened file is marked as
>> +    executable for the current process (e.g., POSIX permissions).
>> +
>> +Note that as long as a policy is enforced, opening any non-regular file with
>> +``O_MAYEXEC`` is denied (e.g. TTYs, pipe), even when such a file is marked as
>> +executable or is on an executable mount point.
>> +
>> +Code samples can be found in tools/testing/selftests/openat2/omayexec_test.c
>> +and interpreter patches (for the original O_MAYEXEC version) may be found at
>> +https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC .
>> +See also an overview article: https://lwn.net/Articles/820000/ .
>> +
>> +
>>  overflowgid & overflowuid
>>  -------------------------
>>  
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 3f074ec77390..8ec13c7fd403 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -39,6 +39,7 @@
>>  #include <linux/bitops.h>
>>  #include <linux/init_task.h>
>>  #include <linux/uaccess.h>
>> +#include <linux/sysctl.h>
>>  
>>  #include "internal.h"
>>  #include "mount.h"
>> @@ -425,6 +426,11 @@ static int sb_permission(struct super_block *sb, struct inode *inode, int mask)
>>  	return 0;
>>  }
>>  
>> +#define OPEN_MAYEXEC_ENFORCE_MOUNT	BIT(0)
>> +#define OPEN_MAYEXEC_ENFORCE_FILE	BIT(1)
>> +
>> +int sysctl_open_mayexec_enforce __read_mostly;
>> +
>>  /**
>>   * inode_permission - Check for access rights to a given inode
>>   * @inode: Inode to check permission on
>> @@ -2861,11 +2867,29 @@ static int may_open(const struct path *path, int acc_mode, int flag)
>>  	case S_IFSOCK:
>>  		if (acc_mode & MAY_EXEC)
>>  			return -EACCES;
>> +		/*
>> +		 * Opening devices (e.g. TTYs) or pipes with O_MAYEXEC may be
>> +		 * legitimate when there is no enforced policy.
>> +		 */
>> +		if ((acc_mode & MAY_OPENEXEC) && sysctl_open_mayexec_enforce)
>> +			return -EACCES;
>>  		flag &= ~O_TRUNC;
>>  		break;
>>  	case S_IFREG:
>>  		if ((acc_mode & MAY_EXEC) && path_noexec(path))
>>  			return -EACCES;
>> +		if (acc_mode & MAY_OPENEXEC) {
>> +			if ((sysctl_open_mayexec_enforce & OPEN_MAYEXEC_ENFORCE_MOUNT)
>> +					&& path_noexec(path))
>> +				return -EACCES;
>> +			if (sysctl_open_mayexec_enforce & OPEN_MAYEXEC_ENFORCE_FILE)
>> +				/*
>> +				 * Because acc_mode may change here, the next and only
>> +				 * use of acc_mode should then be by the following call
>> +				 * to inode_permission().
>> +				 */
>> +				acc_mode |= MAY_EXEC;
>> +		}
>>  		break;
>>  	}
>>  
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 56f835c9a87a..071f37707ccc 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -83,6 +83,7 @@ extern int sysctl_protected_symlinks;
>>  extern int sysctl_protected_hardlinks;
>>  extern int sysctl_protected_fifos;
>>  extern int sysctl_protected_regular;
>> +extern int sysctl_open_mayexec_enforce;
>>  
>>  typedef __kernel_rwf_t rwf_t;
>>  
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index db1ce7af2563..5008a2566e79 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -113,6 +113,7 @@ static int sixty = 60;
>>  
>>  static int __maybe_unused neg_one = -1;
>>  static int __maybe_unused two = 2;
>> +static int __maybe_unused three = 3;
>>  static int __maybe_unused four = 4;
>>  static unsigned long zero_ul;
>>  static unsigned long one_ul = 1;
>> @@ -888,7 +889,6 @@ static int proc_taint(struct ctl_table *table, int write,
>>  	return err;
>>  }
>>  
>> -#ifdef CONFIG_PRINTK
>>  static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
>>  				void *buffer, size_t *lenp, loff_t *ppos)
>>  {
>> @@ -897,7 +897,6 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
>>  
>>  	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>>  }
>> -#endif
>>  
>>  /**
>>   * struct do_proc_dointvec_minmax_conv_param - proc_dointvec_minmax() range checking structure
>> @@ -3264,6 +3263,15 @@ static struct ctl_table fs_table[] = {
>>  		.extra1		= SYSCTL_ZERO,
>>  		.extra2		= &two,
>>  	},
>> +	{
>> +		.procname       = "open_mayexec_enforce",
>> +		.data           = &sysctl_open_mayexec_enforce,
>> +		.maxlen         = sizeof(int),
>> +		.mode           = 0600,
>> +		.proc_handler	= proc_dointvec_minmax_sysadmin,
>> +		.extra1		= SYSCTL_ZERO,
>> +		.extra2		= &three,
>> +	},
>>  #if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
>>  	{
>>  		.procname	= "binfmt_misc",



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux