Frederick Lawler <fred@xxxxxxxxxxxxxx> writes: > On 6/13/22 11:44 PM, Eric W. Biederman wrote: >> Frederick Lawler <fred@xxxxxxxxxxxxxx> writes: >> >>> Hi Eric, >>> >>> On 6/13/22 12:04 PM, Eric W. Biederman wrote: >>>> Frederick Lawler <fred@xxxxxxxxxxxxxx> writes: >>>> >>>>> While experimenting with the security_prepare_creds() LSM hook, we >>>>> noticed that our EPERM error code was not propagated up the callstack. >>>>> Instead ENOMEM is always returned. As a result, some tools may send a >>>>> confusing error message to the user: >>>>> >>>>> $ unshare -rU >>>>> unshare: unshare failed: Cannot allocate memory >>>>> >>>>> A user would think that the system didn't have enough memory, when >>>>> instead the action was denied. >>>>> >>>>> This problem occurs because prepare_creds() and prepare_kernel_cred() >>>>> return NULL when security_prepare_creds() returns an error code. Later, >>>>> functions calling prepare_creds() and prepare_kernel_cred() return >>>>> ENOMEM because they assume that a NULL meant there was no memory >>>>> allocated. >>>>> >>>>> Fix this by propagating an error code from security_prepare_creds() up >>>>> the callstack. >>>> Why would it make sense for security_prepare_creds to return an error >>>> code other than ENOMEM? >>>> > That seems a bit of a violation of what that function is supposed to do >>>> >>> >>> The API allows LSM authors to decide what error code is returned from the >>> cred_prepare hook. security_task_alloc() is a similar hook, and has its return >>> code propagated. >> It is not an api. It is an implementation detail of the linux kernel. >> It is a set of convenient functions that do a job. >> The general rule is we don't support cases without an in-tree user. I >> don't see an in-tree user. >> >>> I'm proposing we follow security_task_allocs() pattern, and add visibility for >>> failure cases in prepare_creds(). >> I am asking why we would want to. Especially as it is not an API, and I >> don't see any good reason for anything but an -ENOMEM failure to be >> supported. >> > We're writing a LSM BPF policy, and not a new LSM. Our policy aims to solve > unprivileged unshare, similar to Debian's patch [1]. We're in a position such > that we can't use that patch because we can't block _all_ of our applications > from performing an unshare. We prefer a granular approach. LSM BPF seems like a > good choice. I am quite puzzled why doesn't /proc/sys/user/max_user_namespaces work for you? > Because LSM BPF exposes these hooks, we should probably treat them as an > API. From that perspective, userspace expects unshare to return a EPERM > when the call is denied permissions. The BPF code gets to be treated as a out of tree kernel module. >> Without an in-tree user that cares it is probably better to go the >> opposite direction and remove the possibility of return anything but >> memory allocation failure. That will make it clearer to implementors >> that a general error code is not supported and this is not a location >> to implement policy, this is only a hook to allocate state for the LSM. >> > > That's a good point, and it's possible we're using the wrong hook for the > policy. Do you know of other hooks we can look into? Not off the top of my head. >>>> I have probably missed a very interesting discussion where that was >>>> mentioned but I don't see link to the discussion or anything explaining >>>> why we want to do that in this change. >>>> >>> >>> AFAIK, this is the start of the discussion. >> You were on v3 and had an out of tree piece of code so I assumed someone >> had at least thought about why you want to implement policy in a piece >> of code whose only purpose is to allocate memory to store state. >> > > No worries. > >> Eric >> >> > > Links: > 1: > https://sources.debian.org/patches/linux/3.16.56-1+deb8u1/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch/ Eric