Re: [PATCH v1 1/3] sysctl: Allow change system v ipc sysctls inside ipc namespace

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> · Mon, 19 Sep 2022 10:26:39 -0500

Alexey Gladkov <legion@xxxxxxxxxx> writes:

> Rootless containers are not allowed to modify kernel IPC parameters.
>
> All default limits are set to such high values that in fact there are no
> limits at all. All limits are not inherited and are initialized to
> default values when a new ipc_namespace is created.
>
> For new ipc_namespace:
>
> size_t       ipc_ns.shm_ctlmax = SHMMAX; // (ULONG_MAX - (1UL << 24))
> size_t       ipc_ns.shm_ctlall = SHMALL; // (ULONG_MAX - (1UL << 24))
> int          ipc_ns.shm_ctlmni = IPCMNI; // (1 << 15)
> int          ipc_ns.shm_rmid_forced = 0;
> unsigned int ipc_ns.msg_ctlmax = MSGMAX; // 8192
> unsigned int ipc_ns.msg_ctlmni = MSGMNI; // 32000
> unsigned int ipc_ns.msg_ctlmnb = MSGMNB; // 16384
>
> The shm_tot (total amount of shared pages) has also ceased to be
> global, it is located in ipc_namespace and is not inherited from
> anywhere.
>
> In such conditions, it cannot be said that these limits limit anything.
> The real limiter for them is cgroups.
>
> If we allow rootless containers to change these parameters, then it can
> only be reduced.

Manfred does that analysis sound correct to you?
Do you have any concerns about allowing the users who create the ipc
namespace to be able to set it's limits?

At a quick skim through everything Alex's analysis above appears
correct to me.

>From 10,000 feet this patch looks good.  I do see a couple of nits
that should be fixed before we merge this into Linus's tree.

> Signed-off-by: Alexey Gladkov <legion@xxxxxxxxxx>
> ---
>  ipc/ipc_sysctl.c | 34 +++++++++++++++++++++++++++++-----
>  1 file changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> index ef313ecfb53a..87eb1b1e42fa 100644
> --- a/ipc/ipc_sysctl.c
> +++ b/ipc/ipc_sysctl.c
> @@ -192,23 +192,47 @@ static int set_is_seen(struct ctl_table_set *set)
>  
>  static int ipc_permissions(struct ctl_table_header *head, struct ctl_table *table)
>  {
> -	int mode = table->mode;
> -
> -#ifdef CONFIG_CHECKPOINT_RESTORE
>  	struct ipc_namespace *ns = current->nsproxy->ipc_ns;
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Historically that was the best we could do.  But now that we have
an ipc_set member in struct ipc_namespace you can use container_of
to compute this value.

For a permission check that is much safer.

> +#ifdef CONFIG_CHECKPOINT_RESTORE
>  	if (((table->data == &ns->ids[IPC_SEM_IDS].next_id) ||
>  	     (table->data == &ns->ids[IPC_MSG_IDS].next_id) ||
>  	     (table->data == &ns->ids[IPC_SHM_IDS].next_id)) &&
>  	    checkpoint_restore_ns_capable(ns->user_ns))
> -		mode = 0666;
> +		return 0666;
>  #endif
> -	return mode;
> +	if (ns->user_ns != &init_user_ns) {
> +		kuid_t ns_root_uid = make_kuid(ns->user_ns, 0);
> +		kgid_t ns_root_gid = make_kgid(ns->user_ns, 0);
> +
> +		if (uid_valid(ns_root_uid) && uid_eq(current_euid(), ns_root_uid))
> +			return table->mode >> 6;
> +
> +		if (gid_valid(ns_root_gid) && in_egroup_p(ns_root_gid))
> +			return table->mode >> 3;

>From 10,000 fee this is fine.  But this has to interact with
test_perm in proc_systl.c.  So can you please do what
net_ctl_permissions does and replicate the chosen mode all through
the mode line.

Perhaps something like:

	kuid_t ns_root_uid;
	kgid_t ns_root_gid

	ipc_set_ownership(head, table, &ns_root_uid, &ns_root_gid);

#ifdef CONFIG_CHECKPOINT_RESTORE
	if (...)
        	mode = 0666;
        else
#endif        
        if (uid_eq(current_euid(), ns_root_uid))
		mode >>= 6;

        else if (uid_eq(in_group_p(ns_root_gid))
		mode >>= 3;

	mode &= 7;
        mode = (mode << 6) | (mode << 3) | mode;
        return mode;

If we always pass through the same logic there is the advantage that we
will always test it, and there is less room for bugs to slip through.

I added a couple of unnecessary simplifications in there that I just
saw as I was writing my example code.

Eric

> +	}
> +
> +	return table->mode;
> +}
> +
> +static void ipc_set_ownership(struct ctl_table_header *head,
> +			      struct ctl_table *table,
> +			      kuid_t *uid, kgid_t *gid)
> +{
> +	struct ipc_namespace *ns =
> +		container_of(head->set, struct ipc_namespace, ipc_set);
> +
> +	kuid_t ns_root_uid = make_kuid(ns->user_ns, 0);
> +	kgid_t ns_root_gid = make_kgid(ns->user_ns, 0);
> +
> +	*uid = uid_valid(ns_root_uid) ? ns_root_uid : GLOBAL_ROOT_UID;
> +	*gid = gid_valid(ns_root_gid) ? ns_root_gid : GLOBAL_ROOT_GID;
>  }
>  
>  static struct ctl_table_root set_root = {
>  	.lookup = set_lookup,
>  	.permissions = ipc_permissions,
> +	.set_ownership = ipc_set_ownership,
>  };
>  
>  bool setup_ipc_sysctls(struct ipc_namespace *ns)

I can't see anything wrong with your proposed ipc_set_ownership.

Eric