Re: [RFC] struct filename, io_uring and audit troubles

Paul Moore <paul@xxxxxxxxxxxxxx> · Mon, 23 Sep 2024 08:54:03 -0400

On Mon, Sep 23, 2024 at 2:31 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 9/22/24 7:50 PM, Al Viro wrote:
> > On Sun, Sep 22, 2024 at 01:49:01AM +0100, Al Viro wrote:
> >
> >>      Another fun bit is that both audit_inode() and audit_inode_child()
> >> may bump the refcount on struct filename.  Which can get really fishy
> >> if they get called by helper thread while the originator is exiting the
> >> syscall - putname() from audit_free_names() in originator vs. refcount
> >> increment in helper is Not Nice(tm), what with the refcount not being
> >> atomic.
> >
> > *blink*
> >
> > OK, I really wonder which version had I been reading at the time; refcount
> > is, indeed, atomic these days.
> >
> > Other problems (->aname pointing to other thread's struct audit_names
> > and outliving reuse of those, as well as insane behaviour of audit predicates
> > on symlink(2)) are, unfortunately, quite real - on the current mainline.
>
> Traveling but took a quick look. As far as I can tell, for the "reuse
> someone elses aname", we could do either:
>
> 1) Just don't reuse the entry. Then we can drop the struct
>    filename->aname completely as well. Yes that might incur an extra
>    alloc for the odd case of audit_enabled and being deep enough that
>    the preallocated names have been used, but doesn't anyone really
>    care? It'll be noise in the overhead anyway. Side note - that would
>    unalign struct filename again. Would be nice to drop audit_names from
>    a core fs struct...
>
> 2) Add a ref to struct audit_names, RCU kfree it when it drops to zero.
>    This would mean dropping struct audit_context->preallocated_names, as
>    otherwise we'd run into trouble there if a context gets blown away
>    while someone else has a ref to that audit_names struct. We could do
>    this without a ref as well, as long as we can store an audit_context
>    pointer in struct audit_names and be able to validate it under RCU.
>    If ctx doesn't match, don't use it.
>
> And probably other ways too, those were just the two immediate ones I
> thought it. Seems like option 1 is simpler and just fine? Quick hack:

[Sorry for the delay, between flying back home, and just not wanting
to think about the kernel for a day, I took the weekend "off".]

Jens and I have talked about similar issues in the past, and I think
the only real solution to ensure the correctness of the audit records
and provide some consistency between the io_uring approach and
traditional syscalls, is to introduce a mechanism where we
create/clone an audit_context in the io_uring prep stage to capture
things like PATH records, stash that audit_context in the io_kiocb
struct, and then restore it later when io_uring does/finishes the
operation.  I'm reasonably confident that we don't need to do it for
all of the io_uring ops, just the !audit_skip case.

I'm always open to ideas, but everything else I can think of is either
far too op-specific to be maintainable long term, a performance
nightmare, or just plain wrong with respect to the audit records.

I keep hoping to have some time to code it up properly, but so far
this year has been an exercise in "I'll just put this fire over here
with the other fire".  Believe it or not, this is at the top of my
TODO list, perhaps this week I can dedicate some time to this.

-- 
paul-moore.com