Re: [RFC] struct filename, io_uring and audit troubles

Paul Moore <paul@xxxxxxxxxxxxxx> · Mon, 23 Sep 2024 20:11:51 -0400

On Mon, Sep 23, 2024 at 4:37 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Sep 23, 2024 at 12:14:29PM -0400, Paul Moore wrote:
>
> [ordering and number of PATH items for syscall]
>
> > >From my point of view, stuff like that is largely driven by enterprise
> > distros chasing 3rd party security certifications so they can sell
> > products/services to a certain class of users.  These are the same
> > enterprise distros that haven't really bothered to contribute a lot to
> > the upstream Linux kernel's audit subsystem lately so I'm not going to
> > worry too much about them at this point.
>
> Umm...  IIRC, sgrubb had been involved in the spec-related horrors, but
> that was a long time ago...

Yep, he was.  Last I spoke to Steve a year or so ago, audit was no
longer part of his job description; Steve still maintains his
userspace audit tools, but that is a nights/weekends job as far as I
understand.

The last time I was involved in any audit/CC spec related work was
well over a decade ago now, and all of those CC protection profiles
have long since expired and been replaced.

> > where I would like to take audit ... eventually).  Assuming your ideas
> > for struct filename don't significantly break audit you can consider
> > me supportive so long as we still have a way to take a struct filename
> > reference inside the audit_context; we need to keep that ref until
> > syscall/io_uring-op exit time as we can't be certain if we need to log
> > the PATH until we know the success/fail status of the operation (among
> > other things).
>
> OK...  As for what I would like to do:
>
>         * go through the VFS side of things and make sure we have a consistent
> set of helpers that would take struct filename * - *not* the ad-hoc mix we
> have right now, when that's basically driven by io_uring borging them in
> one by one - or duplicates them without bothering to share helpers.
> E.g. mkdirat(2) does getname() and passes it to do_mkdirat(), which
> consumes the sucker; so does mknodat(2), but do_mknodat() is static.  OTOH,
> path_setxattr() does setxattr_copy(), then retry_estale loop with
> user_path_at() + mnt_want_write() + do_setxattr() + mnt_drop_write() + path_put()
> as a body, while on io_uring side we have retry_estale loop with filename_lookup() +
> (io_uring helper that does mnt_want_write() + do_setxattr() + mnt_drop_write()) +
> path_put().
>         Sure, that user_path_at() call is getname() + filename_lookup() + putname(),
> so they are equivalent, but keeping that shite in sync is going to be trouble.

I obviously trust you to do the right thing with the VFS bits, and
having a well defined struct filename interface sounds like a good
thing from an audit perspective.  I don't believe it completely solves
the audit/io_uring issue, but it should make things easier and
hopefully will result in less chance of breakage in the future.

>         * get rid of the "repeated getname() on the same address is going to
> give you the same object" - that can't be relied upon without audit, for one
> thing and for another... having a syscall that takes two pathnames that gives
> different audit log (if not predicate evaluation) in cases when those are
> identical pointers vs. strings with identical contenst is, IMO, somewhat
> undesirable.  That kills filename->uaddr.

/uaddr/uptr/ if I'm following you correctly, but yeah, that all seems good.

>         * looking at the users of that stuff, I would probably prefer to
> separate getname*() from insertion into audit context.  It's not that
> tricky - __set_nameidata() catches *everything* that uses nd->name (i.e.
> all that audit_inode() calls in fs/namei.c use).

That should be a pretty significant simplification, that sounds good to me.

> ... What remains is
>         do_symlinkat() for symlink body
>         fs_index() on the argument (if we want to bother - it's a part
> of weird Missed'em'V sysfs(2) syscall; I sincerely doubt that there's
> anybody who'd use it)

We probably should bother, folks that really care about audit don't
like blind spots.  Perhaps make it a separate patch if it isn't too
ugly to split it out.

>         fsconfig(2) FSCONFIG_SET_PATH handling.
>         mq_open(2) and mq_unlink(2) - those bypass the normal pathwalk
> logics, so __set_nameidata() won't catch them.
>         _maybe_ alpha osf_mount(2) devname argument; or we could get rid
> of that stupidity and have it use copy_mount_string() like mount(2) does,
> instead of messing with getname().
>         That's all it takes.  With that done, we can kill ->aname;
> just look in the ->names_list for the first entry with given ->name -
> as in, given struct filename * value, no need to look inside.

Seems reasonable to me.  I can't imagine these special cases being any
worse than what we have now in fs/namei.c, and if nothing else having
a single catch point for the bulk of the VFS lookups makes it worth it
as far as I'm concerned.

-- 
paul-moore.com