On Fri, 2013-10-25 at 10:36 +0900, Toshiyuki Okajima wrote:
systemd |auditd
-------------------------------------------+-----------------------------------
... |
-> audit_receive |...
-> mutex_lock(&audit_cmd_mutex) |-> audit_receive
... -> audit_log_start | -> mutex_lock(&audit_cmd_mutex)
-> wait_for_auditd | // wait for systemd
-> schedule_timeout(60*HZ) |
Ugggh, definitely a problem. Adding a similar hack to systemd really
does not seem like an acceptable answer. It seems to me that in
audit_receive_msg()
case AUDIT_USER:
case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
we do not need to hold the audit_cmd_mutex. So a quick and dirty patch
should be to just drop the mutex there (and we need to verify there
aren't issues running the audit_filter_user() without the lock). That
will take care of systemd and anything USING audit. It still means that
you could race with something configuring audit and auditd shutting
down. Seems like a good quick and dirty 'fix' while we work on a better
fix...
To take care of that I think maybe we could drop the cmd_mutex every
time we call audit_log_start. That's not necessarily going to be
pretty. Maybe make a new switch at the top of the function which knows
which operations we are going to have to allocate an audit_buffer. Drop
the lock, allocate the buffer, then retake the lock to finish running
audit_receive_msg()....
Maybe that second option isn't so hard and we can go directly after that
instead of just dealing with userspace audit messages?
Thoughts?