On 6/7/22 1:47 AM, Christian Brauner wrote:
On Wed, Jun 01, 2022 at 11:44:07AM -0700, Dave Marchevsky wrote:
Since commit 73f03c2b4b52 ("fuse: Restrict allow_other to the
superblock's namespace or a descendant"), access to allow_other FUSE
filesystems has been limited to users in the mounting user namespace or
descendants. This prevents a process that is privileged in its userns -
but not its parent namespaces - from mounting a FUSE fs w/ allow_other
that is accessible to processes in parent namespaces.
While this restriction makes sense overall it breaks a legitimate
usecase: I have a tracing daemon which needs to peek into
process' open files in order to symbolicate - similar to 'perf'. The
daemon is a privileged process in the root userns, but is unable to peek
into FUSE filesystems mounted with allow_other by processes in child
namespaces.
This patch adds a module param, allow_other_parent_userns, to act as an
escape hatch for this descendant userns logic. Setting
allow_other_parent_userns allows non-descendant-userns processes to
access FUSEs mounted with allow_other. A sysadmin setting this param
must trust allow_other FUSEs on the host to not DoS processes as
described in 73f03c2b4b52.
Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx>
---
This is a followup to a previous attempt to solve same problem in a
different way: "fuse: allow CAP_SYS_ADMIN in root userns to access
allow_other mount" [0].
v1 -> v2:
* Use module param instead of capability check
[0]: lore.kernel.org/linux-fsdevel/20211111221142.4096653-1-davemarchevsky@xxxxxx
fs/fuse/dir.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 9dfee44e97ad..3d593ae7dc66 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -11,6 +11,7 @@
#include <linux/pagemap.h>
#include <linux/file.h>
#include <linux/fs_context.h>
+#include <linux/moduleparam.h>
#include <linux/sched.h>
#include <linux/namei.h>
#include <linux/slab.h>
@@ -21,6 +22,12 @@
#include <linux/types.h>
#include <linux/kernel.h>
+static bool __read_mostly allow_other_parent_userns;
+module_param(allow_other_parent_userns, bool, 0644);
+MODULE_PARM_DESC(allow_other_parent_userns,
+ "Allow users not in mounting or descendant userns "
+ "to access FUSE with allow_other set");
The name of the parameter also suggests that access is granted to parent
userns tasks whereas the change seems to me to allows every task access
to that fuse filesystem independent of what userns they are in.
So even a task in a sibling userns could - probably with rather
elaborate mount propagation trickery - access that fuse filesystem.
AFaict, either the module parameter is misnamed or the patch doesn't
implement the behavior expressed in the name.
The original patch restricted access to a CAP_SYS_ADMIN capable task.
Did we agree that it was a good idea to weaken it to all tasks?
Shouldn't we still just restrict this to CAP_SYS_ADMIN capable tasks in
the initial userns?
I think it's fine to allow for CAP_SYS_ADMIN only, but can we then
ignore the allow_other mount option in such case? The idea is that
CAP_SYS_ADMIN allows you to read FUSE-backed contents no matter what, so
user not mounting with allow_other preventing root from reading contents
defeats the purpose at least partially.
+
static void fuse_advise_use_readdirplus(struct inode *dir)
{
struct fuse_inode *fi = get_fuse_inode(dir);
@@ -1230,7 +1237,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
const struct cred *cred;
if (fc->allow_other)
- return current_in_userns(fc->user_ns);
+ return current_in_userns(fc->user_ns) || allow_other_parent_userns;
cred = current_cred();
if (uid_eq(cred->euid, fc->user_id) &&
--
2.30.2