Re: [PATCH 4.10 070/111] audit: fix auditd/kernel connection state tracking

Ingo Molnar <mingo@xxxxxxxxxx> · Wed, 21 Feb 2018 09:46:02 +0100

* Paul Moore <paul@xxxxxxxxxxxxxx> wrote:

> On Tue, Feb 20, 2018 at 10:18 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Tue, Feb 20, 2018 at 09:51:08AM -0500, Paul Moore wrote:
> >> On Tue, Feb 20, 2018 at 9:06 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> >> > It's not at all clear to me what that code does, I just stumbled upon
> >> > __mutex_owner() outside of the mutex code itself and went WTF.
> >>
> >> If you don't want people to use __mutex_owner() outside of the mutex
> >> code I might suggest adding a rather serious comment at the top of the
> >> function, because right now I don't see anything suggesting that
> >> function shouldn't be used.  Yes, there is the double underscore
> >> prefix, but that can mean a few different things these days.
> >
> > Find below.
> >
> >> > The comment (aside from having the most horribly style) ...
> >>
> >> Yeah, your dog is ugly too.  Notice how neither comment is constructive?
> >
> > I'm sure you've seen this one:
> >
> >   https://lkml.org/lkml/2016/7/8/625
> 
> Yep.  I stand behind my earlier comment in this thread.
> 
> >> > Maybe if you could explain how that code is supposed to work and why it
> >> > doesn't know if it holds a lock I could make a suggestion...
> >>
> >> I just spent a few minutes looking back over the bits available in
> >> include/linux/mutex.h and I'm not seeing anything beyond
> >> __mutex_owner() which would allow us to determine the mutex owning
> >> task.  It's probably easiest for us to just track ownership ourselves.
> >> I'll put together a patch later today.
> >
> > Note that up until recently the mutex implementation didn't even have a
> > consistent owner field. And the thing is, it's very easy to use wrong,
> > only today I've seen a patch do: "__mutex_owner() == task", where task
> > was allowed to be !current, which is just wrong.
> 
> Arguably all the more reason why a strongly worded warning is
> important (which I see you've included below, feel free to include my
> Reviewed-by).
> 
> > Looking through kernel/audit.c I'm not even sure I see how you would end
> > up in audit_log_start() with audit_cmd_mutex held.
> >
> > Can you give me a few code paths that trigger this? Simple git-grep is
> > failing me.
> 
> Basically look at the code in audit_receive_msg(), but I wasn't asking
> your opinion on how we should rewrite the audit subsystem, I was just
> asking how one could determine if the current task was holding a given
> mutex in a way that was acceptable to you.  Based on your comments,
> and some further inspection of the mutex code, it appears that is/was
> not something that the core mutex code wants to support/make-visible.
> Which is perfectly fine, I just wanted to make sure I wasn't missing
> something before I went ahead and wrote a wrapper around the mutex
> code for use by audit.
> 
> FWIW, I just put together the following patch which removes the
> __mutex_owner() call from audit and doesn't appear to break anything
> on the audit side (you're CC'd on the patch).  It has only been
> lightly tested, but I'm going to bang on it for a day or so and if I
> hear no objections I'll merge it into audit/next.
>
> * https://www.redhat.com/archives/linux-audit/2018-February/msg00066.html

Could you please explain the audit_ctl_lock()/unlock() primitive you are 
introducing there? You seem to be implementing some sort of recursive locking 
primitive, but in a strange way.

AFAICS the primary problem appears to be this code path:

  audit_receive() -> audit_receive_msg() -> AUDIT_TTY_SET -> audit_log_common_recv_msg() -> audit_log_start()

where we can arrive already holding the lock.

I.e. recursive mutex, kinda.

What's the thinking there? Neither the changelog nor the code explains this.

Thanks,

	Ingo