Re: [PATCH] signal: restore the override_rlimit logic

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> · Fri, 01 Nov 2024 14:51:00 -0500

Roman Gushchin <roman.gushchin@xxxxxxxxx> writes:

> Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
> ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class
> of signals. However now it's enforced unconditionally, even if
> override_rlimit is set.

Not true.

It added a limit on the number of siginfo structures that
a container may allocate.  Have you tried not limiting your
container?

>This behavior change caused production issues.

> For example, if the limit is reached and a process receives a SIGSEGV
> signal, sigqueue_alloc fails to allocate the necessary resources for the
> signal delivery, preventing the signal from being delivered with
> siginfo. This prevents the process from correctly identifying the fault
> address and handling the error. From the user-space perspective,
> applications are unaware that the limit has been reached and that the
> siginfo is effectively 'corrupted'. This can lead to unpredictable
> behavior and crashes, as we observed with java applications.

Note.  There are always conditions when the allocation may fail.
The structure is allocated with __GFP_ATOMIC so it is much more likely
to fail than a typical kernel memory allocation.

But I agree it does look like there is a quality of implementation issue
here.

> Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and
> skip the comparison to max there if override_rlimit is set. This
> effectively restores the old behavior.

Instead please just give the container and unlimited number of siginfo
structures it can play with.

The maximum for rlimit(RLIM_SIGPENDING) is the rlimit(RLIM_SIGPENDING)
value when the user namespace is created.

Given that it took 3 and half years to report this.  I am going to
say this really looks like a userspace bug.

Beyond that your patch is actually buggy, and should not be applied.

If we want to change the semantics and ignore the maximum number of
pending signals in a container (when override_rlimit is set) then
the code should change the computation of the max value (pegging it at
LONG_MAX) and not ignore it.

As it is the patch below disables the check that keeps the ucount
counters from wrapping around.  That makes it possible for someone to
overflow those counters and get into all kinds of trouble.

Eric

> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
> Signed-off-by: Roman Gushchin <roman.gushchin@xxxxxxxxx>
> Co-developed-by: Andrei Vagin <avagin@xxxxxxxxxx>
> Signed-off-by: Andrei Vagin <avagin@xxxxxxxxxx>
> Cc: Kees Cook <kees@xxxxxxxxxx>
> Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Cc: Alexey Gladkov <legion@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> ---
>  include/linux/user_namespace.h | 3 ++-
>  kernel/signal.c                | 3 ++-
>  kernel/ucount.c                | 5 +++--
>  3 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 3625096d5f85..7183e5aca282 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -141,7 +141,8 @@ static inline long get_rlimit_value(struct ucounts *ucounts, enum rlimit_type ty
>  
>  long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v);
>  bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v);
> -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type);
> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type,
> +			    bool override_rlimit);
>  void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type);
>  bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max);
>  
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 4344860ffcac..cbabb2d05e0a 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -419,7 +419,8 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>  	 */
>  	rcu_read_lock();
>  	ucounts = task_ucounts(t);
> -	sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
> +	sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING,
> +					    override_rlimit);
>  	rcu_read_unlock();
>  	if (!sigpending)
>  		return NULL;
> diff --git a/kernel/ucount.c b/kernel/ucount.c
> index 16c0ea1cb432..046b3d57ebb4 100644
> --- a/kernel/ucount.c
> +++ b/kernel/ucount.c
> @@ -307,7 +307,8 @@ void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type)
>  	do_dec_rlimit_put_ucounts(ucounts, NULL, type);
>  }
>  
> -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type,
> +			    bool override_rlimit)
>  {
>  	/* Caller must hold a reference to ucounts */
>  	struct ucounts *iter;
> @@ -316,7 +317,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
>  
>  	for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>  		long new = atomic_long_add_return(1, &iter->rlimit[type]);
> -		if (new < 0 || new > max)
> +		if (new < 0 || (!override_rlimit && (new > max)))
>  			goto unwind;
>  		if (iter == ucounts)
>  			ret = new;