Hi.
(2013/10/26 0:12), Eric Paris wrote:
On Fri, 2013-10-25 at 10:36 +0900, Toshiyuki Okajima wrote:
> systemd |auditd
> -------------------------------------------+-----------------------------------
> ... |
> -> audit_receive |...
> -> mutex_lock(&audit_cmd_mutex) |-> audit_receive
> ... -> audit_log_start | ->
mutex_lock(&audit_cmd_mutex)
> -> wait_for_auditd | // wait for systemd
> -> schedule_timeout(60*HZ) |
Ugggh, definitely a problem. Adding a similar hack to systemd
really
does not seem like an acceptable answer. It seems to me that in
I think so, too.
We should fix it against the various cases.
audit_receive_msg()
case AUDIT_USER:
case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
we do not need to hold the audit_cmd_mutex. So a quick and dirty patch
should be to just drop the mutex there (and we need to verify there
aren't issues running the audit_filter_user() without the lock). That
will take care of systemd and anything USING audit. It still means that
you could race with something configuring audit and auditd shutting
down. Seems like a good quick and dirty 'fix' while we work on a better
fix...
To take care of that I think maybe we could drop the cmd_mutex every
time we call audit_log_start. That's not necessarily going to be
pretty. Maybe make a new switch at the top of the function which knows
which operations we are going to have to allocate an audit_buffer. Drop
the lock, allocate the buffer, then retake the lock to finish running
audit_receive_msg()....
Maybe that second option isn't so hard and we can go directly
after that
instead of just dealing with userspace audit messages?
Thoughts?
Does it mean that we can also fix the problem only in the userspace?
Even if we fix userspace process (auditd, readahead-collector and systemd) only,
the problem would happen again if a new userspace audit process is implemented.
Therefore, I think we should fix only in the kernel.
Sorry, but I don't have clear method to fix it.
Regards,
Toshiyuki Okajima