On Wed, Jun 15, 2022 at 6:30 AM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Tue, Jun 14, 2022 at 01:59:08PM -0500, Frederick Lawler wrote: > > On 6/14/22 11:30 AM, Eric W. Biederman wrote: > > > Frederick Lawler <fred@xxxxxxxxxxxxxx> writes: > > > > > > > On 6/13/22 11:44 PM, Eric W. Biederman wrote: > > > > > Frederick Lawler <fred@xxxxxxxxxxxxxx> writes: > > > > > > > > > > > Hi Eric, > > > > > > > > > > > > On 6/13/22 12:04 PM, Eric W. Biederman wrote: > > > > > > > Frederick Lawler <fred@xxxxxxxxxxxxxx> writes: > > > > > > > > > > > > > > > While experimenting with the security_prepare_creds() LSM hook, we > > > > > > > > noticed that our EPERM error code was not propagated up the callstack. > > > > > > > > Instead ENOMEM is always returned. As a result, some tools may send a > > > > > > > > confusing error message to the user: > > > > > > > > > > > > > > > > $ unshare -rU > > > > > > > > unshare: unshare failed: Cannot allocate memory > > > > > > > > > > > > > > > > A user would think that the system didn't have enough memory, when > > > > > > > > instead the action was denied. > > > > > > > > > > > > > > > > This problem occurs because prepare_creds() and prepare_kernel_cred() > > > > > > > > return NULL when security_prepare_creds() returns an error code. Later, > > > > > > > > functions calling prepare_creds() and prepare_kernel_cred() return > > > > > > > > ENOMEM because they assume that a NULL meant there was no memory > > > > > > > > allocated. > > > > > > > > > > > > > > > > Fix this by propagating an error code from security_prepare_creds() up > > > > > > > > the callstack. > > > > > > > Why would it make sense for security_prepare_creds to return an error > > > > > > > code other than ENOMEM? > > > > > > > > That seems a bit of a violation of what that function is supposed to do > > > > > > > > > > > > > > > > > > > The API allows LSM authors to decide what error code is returned from the > > > > > > cred_prepare hook. security_task_alloc() is a similar hook, and has its return > > > > > > code propagated. > > > > > It is not an api. It is an implementation detail of the linux kernel. > > > > > It is a set of convenient functions that do a job. > > > > > The general rule is we don't support cases without an in-tree user. I > > > > > don't see an in-tree user. > > > > > > > > > > > I'm proposing we follow security_task_allocs() pattern, and add visibility for > > > > > > failure cases in prepare_creds(). > > > > > I am asking why we would want to. Especially as it is not an API, and I > > > > > don't see any good reason for anything but an -ENOMEM failure to be > > > > > supported. > > > > > > > > > We're writing a LSM BPF policy, and not a new LSM. Our policy aims to solve > > > > unprivileged unshare, similar to Debian's patch [1]. We're in a position such > > > > that we can't use that patch because we can't block _all_ of our applications > > > > from performing an unshare. We prefer a granular approach. LSM BPF seems like a > > > > good choice. > > > > > > I am quite puzzled why doesn't /proc/sys/user/max_user_namespaces work > > > for you? > > > > > > > We have the following requirements: > > > > 1. Allow list criteria > > 2. root user must be able to create namespaces whenever > > 3. Everything else not in 1 & 2 must be denied > > > > We use per task attributes to determine whether or not we allow/deny the > > current call to unshare(). > > > > /proc/sys/user/max_user_namespaces limits are a bit broad for this level of > > detail. > > > > > > Because LSM BPF exposes these hooks, we should probably treat them as an > > > > API. From that perspective, userspace expects unshare to return a EPERM > > > > when the call is denied permissions. > > > > > > The BPF code gets to be treated as a out of tree kernel module. > > > > > > > > Without an in-tree user that cares it is probably better to go the > > > > > opposite direction and remove the possibility of return anything but > > > > > memory allocation failure. That will make it clearer to implementors > > > > > that a general error code is not supported and this is not a location > > > > > to implement policy, this is only a hook to allocate state for the LSM. > > > > > > > > > > > > > That's a good point, and it's possible we're using the wrong hook for the > > > > policy. Do you know of other hooks we can look into? > > Fwiw, from this commit it wasn't very clear what you wanted to achieve > with this. It might be worth considering adding a new security hook for > this. Within msft it recently came up SELinux might have an interest in > something like this as well. Just to clarify things a bit, I believe SELinux would have an interest in a LSM hook capable of implementing an access control point for user namespaces regardless of Microsoft's current needs. I suspect due to the security relevant nature of user namespaces most other LSMs would be interested as well; it seems like a well crafted hook would be welcome by most folks I think. -- paul-moore.com