* Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > On Tue, Feb 20, 2018 at 10:18 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Feb 20, 2018 at 09:51:08AM -0500, Paul Moore wrote: > >> On Tue, Feb 20, 2018 at 9:06 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > >> > It's not at all clear to me what that code does, I just stumbled upon > >> > __mutex_owner() outside of the mutex code itself and went WTF. > >> > >> If you don't want people to use __mutex_owner() outside of the mutex > >> code I might suggest adding a rather serious comment at the top of the > >> function, because right now I don't see anything suggesting that > >> function shouldn't be used. Yes, there is the double underscore > >> prefix, but that can mean a few different things these days. > > > > Find below. > > > >> > The comment (aside from having the most horribly style) ... > >> > >> Yeah, your dog is ugly too. Notice how neither comment is constructive? > > > > I'm sure you've seen this one: > > > > https://lkml.org/lkml/2016/7/8/625 > > Yep. I stand behind my earlier comment in this thread. > > >> > Maybe if you could explain how that code is supposed to work and why it > >> > doesn't know if it holds a lock I could make a suggestion... > >> > >> I just spent a few minutes looking back over the bits available in > >> include/linux/mutex.h and I'm not seeing anything beyond > >> __mutex_owner() which would allow us to determine the mutex owning > >> task. It's probably easiest for us to just track ownership ourselves. > >> I'll put together a patch later today. > > > > Note that up until recently the mutex implementation didn't even have a > > consistent owner field. And the thing is, it's very easy to use wrong, > > only today I've seen a patch do: "__mutex_owner() == task", where task > > was allowed to be !current, which is just wrong. > > Arguably all the more reason why a strongly worded warning is > important (which I see you've included below, feel free to include my > Reviewed-by). > > > Looking through kernel/audit.c I'm not even sure I see how you would end > > up in audit_log_start() with audit_cmd_mutex held. > > > > Can you give me a few code paths that trigger this? Simple git-grep is > > failing me. > > Basically look at the code in audit_receive_msg(), but I wasn't asking > your opinion on how we should rewrite the audit subsystem, I was just > asking how one could determine if the current task was holding a given > mutex in a way that was acceptable to you. Based on your comments, > and some further inspection of the mutex code, it appears that is/was > not something that the core mutex code wants to support/make-visible. > Which is perfectly fine, I just wanted to make sure I wasn't missing > something before I went ahead and wrote a wrapper around the mutex > code for use by audit. > > FWIW, I just put together the following patch which removes the > __mutex_owner() call from audit and doesn't appear to break anything > on the audit side (you're CC'd on the patch). It has only been > lightly tested, but I'm going to bang on it for a day or so and if I > hear no objections I'll merge it into audit/next. > > * https://www.redhat.com/archives/linux-audit/2018-February/msg00066.html Could you please explain the audit_ctl_lock()/unlock() primitive you are introducing there? You seem to be implementing some sort of recursive locking primitive, but in a strange way. AFAICS the primary problem appears to be this code path: audit_receive() -> audit_receive_msg() -> AUDIT_TTY_SET -> audit_log_common_recv_msg() -> audit_log_start() where we can arrive already holding the lock. I.e. recursive mutex, kinda. What's the thinking there? Neither the changelog nor the code explains this. Thanks, Ingo