Re: [RFC][PATCH] tracefs: Set all files to the same group ownership as the mount option

Kalesh Singh <kaleshsingh@xxxxxxxxxx> · Tue, 7 Dec 2021 09:04:30 -0800

On Mon, Dec 6, 2021 at 6:12 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> From: "Steven Rostedt (VMware)" <rostedt@xxxxxxxxxxx>
>
> As people have been asking to allow non-root processes to have access to
> the tracefs directory, it was considered best to only allow groups to have
> access to the directory, where it is easier to just set the tracefs file
> system to a specific group (as other would be too dangerous), and that way
> the admins could pick which processes would have access to tracefs.
>
> Unfortunately, this broke tooling on Android that expected the other bit
> to be set. For some special cases, for non-root tools to trace the system,
> tracefs would be mounted and change the permissions of the top level
> directory which gave access to all running tasks permission to the
> tracing directory. Even though this would be dangerous to do in a
> production environment, for testing environments this can be useful.
>
> Now with the new changes to not allow other (which is still the proper
> thing to do), it breaks the testing tooling. Now more code needs to be
> loaded on the system to change ownership of the tracing directory.
>
> The real solution is to have tracefs honor the gid=xxx option when
> mounting. That is,
>
> (tracing group tracing has value 1003)
>
>  mount -t tracefs -o gid=1003 tracefs /sys/kernel/tracing
>
> should have it that all files in the tracing directory should be of the
> given group.
>
> Copy the logic from d_walk() from dcache.c and simplify it for the mount
> case of tracefs if gid is set. All the files in tracefs will be walked and
> their group will be set to the value passed in.
>
> Reported-by: Kalesh Singh <kaleshsingh@xxxxxxxxxx>
> Reported-by: Yabin Cui <yabinc@xxxxxxxxxx>
> Fixes: 49d67e445742 ("tracefs: Have tracefs directories not set OTH permission bits by default")
> Signed-off-by: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
> ---
>
>   I'm posting this as an RFC as I want to make sure this is the proper way
>   to handle this. It really makes sense. As tracefs is simply a file system
>   with a bunch of control knobs to control tracing, if you mount it with
>   gid=xxx then the control knobs should be controlled by group xxx.
>
>
>  fs/tracefs/inode.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index 925a621b432e..f20f575cdaef 100644
> +++ b/fs/tracefs/inode.c
> @@ -161,6 +161,79 @@ struct tracefs_fs_info {
>         struct tracefs_mount_opts mount_opts;
>  };
>
> +static void change_gid(struct dentry *dentry, kgid_t gid)
> +{
> +       if (!dentry->d_inode)
> +               return;
> +       dentry->d_inode->i_gid = gid;
> +}
> +
> +/*
> + * Taken from d_walk, but without he need for handling renames.
> + * Nothing can be renamed while walking the list, as tracefs
> + * does not support renames. This is only called when mounting
> + * or remounting the file system, to set all the files to
> + * the given gid.
> + */

Hi Steve,

One thing that I missed before: There are files that can be generated
after the mount, for instance when a new synthetic event is added new
entries for that event are created under events/synthetic/ and when a
new instance is created the new entries generated under instances/.
These new entries don't honor the gid specified when mounting. Could
we make it so that they also respect the specified gid?

Thanks,
Kalesh

> +static void set_gid(struct dentry *parent, kgid_t gid)
> +{
> +       struct dentry *this_parent;
> +       struct list_head *next;
> +
> +       this_parent = parent;
> +       spin_lock(&this_parent->d_lock);
> +
> +       change_gid(this_parent, gid);
> +repeat:
> +       next = this_parent->d_subdirs.next;
> +resume:
> +       while (next != &this_parent->d_subdirs) {
> +               struct list_head *tmp = next;
> +               struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
> +               next = tmp->next;
> +
> +               spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> +
> +               change_gid(dentry, gid);
> +
> +               if (!list_empty(&dentry->d_subdirs)) {
> +                       spin_unlock(&this_parent->d_lock);
> +                       spin_release(&dentry->d_lock.dep_map, _RET_IP_);
> +                       this_parent = dentry;
> +                       spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
> +                       goto repeat;
> +               }
> +               spin_unlock(&dentry->d_lock);
> +       }
> +       /*
> +        * All done at this level ... ascend and resume the search.
> +        */
> +       rcu_read_lock();
> +ascend:
> +       if (this_parent != parent) {
> +               struct dentry *child = this_parent;
> +               this_parent = child->d_parent;
> +
> +               spin_unlock(&child->d_lock);
> +               spin_lock(&this_parent->d_lock);
> +
> +               /* go into the first sibling still alive */
> +               do {
> +                       next = child->d_child.next;
> +                       if (next == &this_parent->d_subdirs)
> +                               goto ascend;
> +                       child = list_entry(next, struct dentry, d_child);
> +               } while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED));
> +               rcu_read_unlock();
> +               goto resume;
> +       }
> +       rcu_read_unlock();
> +
> +out_unlock:
> +       spin_unlock(&this_parent->d_lock);
> +       return;
> +}
> +
>  static int tracefs_parse_options(char *data, struct tracefs_mount_opts *opts)
>  {
>         substring_t args[MAX_OPT_ARGS];
> @@ -193,6 +266,7 @@ static int tracefs_parse_options(char *data, struct tracefs_mount_opts *opts)
>                         if (!gid_valid(gid))
>                                 return -EINVAL;
>                         opts->gid = gid;
> +                       set_gid(tracefs_mount->mnt_root, gid);
>                         break;
>                 case Opt_mode:
>                         if (match_octal(&args[0], &option))
> --
> 2.31.1
>