On Tue, Feb 20, 2018 at 10:18 AM, Peter Zijlstra
<peterz(a)infradead.org> wrote:
> On Tue, Feb 20, 2018 at 09:51:08AM -0500, Paul Moore wrote:
>> On Tue, Feb 20, 2018 at 9:06 AM, Peter Zijlstra <peterz(a)infradead.org>
wrote:
>
>> > It's not at all clear to me what that code does, I just stumbled upon
>> > __mutex_owner() outside of the mutex code itself and went WTF.
>>
>> If you don't want people to use __mutex_owner() outside of the mutex
>> code I might suggest adding a rather serious comment at the top of the
>> function, because right now I don't see anything suggesting that
>> function shouldn't be used. Yes, there is the double underscore
>> prefix, but that can mean a few different things these days.
>
> Find below.
>
>> > The comment (aside from having the most horribly style) ...
>>
>> Yeah, your dog is ugly too. Notice how neither comment is constructive?
>
> I'm sure you've seen this one:
>
>
https://lkml.org/lkml/2016/7/8/625
Yep. I stand behind my earlier comment in this thread.
>> > Maybe if you could explain how that code is supposed to work and why it
>> > doesn't know if it holds a lock I could make a suggestion...
>>
>> I just spent a few minutes looking back over the bits available in
>> include/linux/mutex.h and I'm not seeing anything beyond
>> __mutex_owner() which would allow us to determine the mutex owning
>> task. It's probably easiest for us to just track ownership ourselves.
>> I'll put together a patch later today.
>
> Note that up until recently the mutex implementation didn't even have a
> consistent owner field. And the thing is, it's very easy to use wrong,
> only today I've seen a patch do: "__mutex_owner() == task", where
task
> was allowed to be !current, which is just wrong.
Arguably all the more reason why a strongly worded warning is
important (which I see you've included below, feel free to include my
Reviewed-by).
> Looking through kernel/audit.c I'm not even sure I see how you would end
> up in audit_log_start() with audit_cmd_mutex held.
>
> Can you give me a few code paths that trigger this? Simple git-grep is
> failing me.
Basically look at the code in audit_receive_msg(), but I wasn't asking
your opinion on how we should rewrite the audit subsystem, I was just
asking how one could determine if the current task was holding a given
mutex in a way that was acceptable to you. Based on your comments,
and some further inspection of the mutex code, it appears that is/was
not something that the core mutex code wants to support/make-visible.
Which is perfectly fine, I just wanted to make sure I wasn't missing
something before I went ahead and wrote a wrapper around the mutex
code for use by audit.
FWIW, I just put together the following patch which removes the
__mutex_owner() call from audit and doesn't appear to break anything
on the audit side (you're CC'd on the patch). It has only been
lightly tested, but I'm going to bang on it for a day or so and if I
hear no objections I'll merge it into audit/next.
*
https://www.redhat.com/archives/linux-audit/2018-February/msg00066.html
Could you please explain the audit_ctl_lock()/unlock() primitive you are
introducing there? You seem to be implementing some sort of recursive locking
primitive, but in a strange way.
AFAICS the primary problem appears to be this code path:
audit_receive() -> audit_receive_msg() -> AUDIT_TTY_SET ->
audit_log_common_recv_msg() -> audit_log_start()
where we can arrive already holding the lock.
I.e. recursive mutex, kinda.
What's the thinking there? Neither the changelog nor the code explains this.
Thanks,
Ingo