Re: [RFC] struct filename, io_uring and audit troubles

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Mon, 23 Sep 2024 21:36:59 +0100

On Mon, Sep 23, 2024 at 12:14:29PM -0400, Paul Moore wrote:

[ordering and number of PATH items for syscall]

> >From my point of view, stuff like that is largely driven by enterprise
> distros chasing 3rd party security certifications so they can sell
> products/services to a certain class of users.  These are the same
> enterprise distros that haven't really bothered to contribute a lot to
> the upstream Linux kernel's audit subsystem lately so I'm not going to
> worry too much about them at this point.

Umm...  IIRC, sgrubb had been involved in the spec-related horrors, but
that was a long time ago...

> where I would like to take audit ... eventually).  Assuming your ideas
> for struct filename don't significantly break audit you can consider
> me supportive so long as we still have a way to take a struct filename
> reference inside the audit_context; we need to keep that ref until
> syscall/io_uring-op exit time as we can't be certain if we need to log
> the PATH until we know the success/fail status of the operation (among
> other things).

OK...  As for what I would like to do:

	* go through the VFS side of things and make sure we have a consistent
set of helpers that would take struct filename * - *not* the ad-hoc mix we
have right now, when that's basically driven by io_uring borging them in
one by one - or duplicates them without bothering to share helpers.
E.g. mkdirat(2) does getname() and passes it to do_mkdirat(), which
consumes the sucker; so does mknodat(2), but do_mknodat() is static.  OTOH,
path_setxattr() does setxattr_copy(), then retry_estale loop with
user_path_at() + mnt_want_write() + do_setxattr() + mnt_drop_write() + path_put()
as a body, while on io_uring side we have retry_estale loop with filename_lookup() +
(io_uring helper that does mnt_want_write() + do_setxattr() + mnt_drop_write()) +
path_put().
	Sure, that user_path_at() call is getname() + filename_lookup() + putname(),
so they are equivalent, but keeping that shite in sync is going to be trouble.

	* get rid of the "repeated getname() on the same address is going to
give you the same object" - that can't be relied upon without audit, for one
thing and for another... having a syscall that takes two pathnames that gives
different audit log (if not predicate evaluation) in cases when those are
identical pointers vs. strings with identical contenst is, IMO, somewhat
undesirable.  That kills filename->uaddr.

	* looking at the users of that stuff, I would probably prefer to
separate getname*() from insertion into audit context.  It's not that
tricky - __set_nameidata() catches *everything* that uses nd->name (i.e.
all that audit_inode() calls in fs/namei.c use).  What remains is
	do_symlinkat() for symlink body
	fs_index() on the argument (if we want to bother - it's a part
of weird Missed'em'V sysfs(2) syscall; I sincerely doubt that there's
anybody who'd use it)
	fsconfig(2) FSCONFIG_SET_PATH handling.
	mq_open(2) and mq_unlink(2) - those bypass the normal pathwalk
logics, so __set_nameidata() won't catch them.
	_maybe_ alpha osf_mount(2) devname argument; or we could get rid
of that stupidity and have it use copy_mount_string() like mount(2) does,
instead of messing with getname().
	That's all it takes.  With that done, we can kill ->aname;
just look in the ->names_list for the first entry with given ->name -
as in, given struct filename * value, no need to look inside.