On Wed, Jun 15, 2022 at 3:14 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > > On Wed, Jun 15, 2022 at 6:30 AM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > On Tue, Jun 14, 2022 at 01:59:08PM -0500, Frederick Lawler wrote: > > > On 6/14/22 11:30 AM, Eric W. Biederman wrote: > > > > Frederick Lawler <fred@xxxxxxxxxxxxxx> writes: > > > > > > > > > On 6/13/22 11:44 PM, Eric W. Biederman wrote: > > > > > > Frederick Lawler <fred@xxxxxxxxxxxxxx> writes: > > > > > > > > > > > > > Hi Eric, > > > > > > > > > > > > > > On 6/13/22 12:04 PM, Eric W. Biederman wrote: > > > > > > > > Frederick Lawler <fred@xxxxxxxxxxxxxx> writes: > > > > > > > > > > > > > > > > > While experimenting with the security_prepare_creds() LSM hook, we > > > > > > > > > noticed that our EPERM error code was not propagated up the callstack. > > > > > > > > > Instead ENOMEM is always returned. As a result, some tools may send a > > > > > > > > > confusing error message to the user: > > > > > > > > > > > > > > > > > > $ unshare -rU > > > > > > > > > unshare: unshare failed: Cannot allocate memory > > > > > > > > > > > > > > > > > > A user would think that the system didn't have enough memory, when > > > > > > > > > instead the action was denied. > > > > > > > > > > > > > > > > > > This problem occurs because prepare_creds() and prepare_kernel_cred() > > > > > > > > > return NULL when security_prepare_creds() returns an error code. Later, > > > > > > > > > functions calling prepare_creds() and prepare_kernel_cred() return > > > > > > > > > ENOMEM because they assume that a NULL meant there was no memory > > > > > > > > > allocated. > > > > > > > > > > > > > > > > > > Fix this by propagating an error code from security_prepare_creds() up > > > > > > > > > the callstack. > > > > > > > > Why would it make sense for security_prepare_creds to return an error > > > > > > > > code other than ENOMEM? > > > > > > > > > That seems a bit of a violation of what that function is supposed to do > > > > > > > > > > > > > > > > > > > > > > The API allows LSM authors to decide what error code is returned from the > > > > > > > cred_prepare hook. security_task_alloc() is a similar hook, and has its return > > > > > > > code propagated. > > > > > > It is not an api. It is an implementation detail of the linux kernel. > > > > > > It is a set of convenient functions that do a job. > > > > > > The general rule is we don't support cases without an in-tree user. I > > > > > > don't see an in-tree user. > > > > > > > > > > > > > I'm proposing we follow security_task_allocs() pattern, and add visibility for > > > > > > > failure cases in prepare_creds(). > > > > > > I am asking why we would want to. Especially as it is not an API, and I > > > > > > don't see any good reason for anything but an -ENOMEM failure to be > > > > > > supported. > > > > > > > > > > > We're writing a LSM BPF policy, and not a new LSM. Our policy aims to solve > > > > > unprivileged unshare, similar to Debian's patch [1]. We're in a position such > > > > > that we can't use that patch because we can't block _all_ of our applications > > > > > from performing an unshare. We prefer a granular approach. LSM BPF seems like a > > > > > good choice. > > > > > > > > I am quite puzzled why doesn't /proc/sys/user/max_user_namespaces work > > > > for you? > > > > > > > > > > We have the following requirements: > > > > > > 1. Allow list criteria > > > 2. root user must be able to create namespaces whenever > > > 3. Everything else not in 1 & 2 must be denied > > > > > > We use per task attributes to determine whether or not we allow/deny the > > > current call to unshare(). > > > > > > /proc/sys/user/max_user_namespaces limits are a bit broad for this level of > > > detail. > > > > > > > > Because LSM BPF exposes these hooks, we should probably treat them as an > > > > > API. From that perspective, userspace expects unshare to return a EPERM > > > > > when the call is denied permissions. > > > > > > > > The BPF code gets to be treated as a out of tree kernel module. > > > > > > > > > > Without an in-tree user that cares it is probably better to go the > > > > > > opposite direction and remove the possibility of return anything but > > > > > > memory allocation failure. That will make it clearer to implementors > > > > > > that a general error code is not supported and this is not a location > > > > > > to implement policy, this is only a hook to allocate state for the LSM. > > > > > > > > > > > > > > > > That's a good point, and it's possible we're using the wrong hook for the > > > > > policy. Do you know of other hooks we can look into? > > > > Fwiw, from this commit it wasn't very clear what you wanted to achieve > > with this. It might be worth considering adding a new security hook for > > this. Within msft it recently came up SELinux might have an interest in > > something like this as well. > > Just to clarify things a bit, I believe SELinux would have an interest > in a LSM hook capable of implementing an access control point for user > namespaces regardless of Microsoft's current needs. I suspect due to > the security relevant nature of user namespaces most other LSMs would > be interested as well; it seems like a well crafted hook would be > welcome by most folks I think. > > -- > paul-moore.com Just to get the full picture: is there actually a good reason not to make this hook support this scenario? I understand it was not originally intended for this, but it is well positioned in the code, covers multiple subsystems (not only user namespaces), doesn't require changing the LSM interface and it already does the job - just the kernel internals need to respect the error code better. What bad things can happen if we extend its use case to not only allocate resources in LSMs? After all, the original Linus email introducing Linux stated that Linux was not intended to be a great OS, but here we are :) Ignat -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/linux-cachefs