[RFC] struct filename, io_uring and audit troubles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



	Looks like things like async unlink might fuck the audit
very badly.  io_uring does getname() in originating thread and uses the
result at the time of operation, which can happen in a different thread.
Moreover, by that time the original syscall might have very well returned.

	The trouble is, getname() establishes linkage between the struct
filename and struct audit_name; filename->aname and audit_name->name
respectively.  struct filename can get moved from one thread to another;
struct audit_name is very much tied to audit_context, which is per-thread
- first few (5, currently) audit_name instances are embedded into
audit_context.	The rest gets allocated dynamically, but all of them
are placed on audit_context::names_list.

	At audit_free_names() they are all wiped out - references
back to filename instances are dropped, dynamically allocated ones
are freed, and while embedded ones survive, they are zeroed out on
reuse by audit_alloc_name().  audit_free_names() is called on each
audit_reset_context(), which is done by __audit_syscall_exit() and
(in states other than AUDIT_SYSCALL_CONTEXT) __audit_uring_exit().

	Linkage from filename to audit_name is used by __audit_inode().
It definitely expects the reference back to filename to be stable.
And in situation when io_uring has offloaded a directory operation to
helper thread, that is not guaranteed.

	Another fun bit is that both audit_inode() and audit_inode_child()
may bump the refcount on struct filename.  Which can get really fishy
if they get called by helper thread while the originator is exiting the
syscall - putname() from audit_free_names() in originator vs. refcount
increment in helper is Not Nice(tm), what with the refcount not being
atomic.

	Potential solutions:

* don't bother with audit_name creation and linkage in getname(); do that
when we start using the sucker.  Doing that from __set_nameidata() will
catch the majority of the stuff that ever gets audit_inode* called for it
(the only exceptions are mq_open(2) and mq_unlink(2)).  Unfortunately,
each audit_name instance gets spewed into logs, so we would need to
bring the rest of that shite in, including the things like symlink
bodies (note that for io_uring-originating symlink we'd need that done
in do_symlinkat()), etc.  Unpleasant, that.

* make refcount atomic, add a pointer to audit_context or even task_struct
in audit_name, have the "use name->aname if the type is acceptable"
logics in audit_inode dependent upon the name->aname->owner matching
what we want.  With some locking to make the check itself safe.

* make refcount atomic, get rid of ->aname and have audit_inode() scan
the names_list for entries with matching ->name and type - and that
before the existing scan with ->name->name comparisons.

* something else?

	Suggestions _not_ involving creative uses of TARDIS would
be welcome.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux