Hello,
Having the kernel detect a signal being sent to the audit daemon is not
working. Is anyone troubleshooting this or do we take another approach?
I spent some time yesterday thinking about the shutdown. I came to the
conclusion that the only way to "do it right" is to get the credentials in
the signal handler. Everything else is racy.
THE PROBLEM
When I get the term signal, I would need to wait for the event to be logged to
disk. So that means I have to inspect each packet and wait until the shutdown
message comes through. But what if the backlog was full when that event would
have been enqueued?
Also, suppose I have a time out. When the timeout occurs, I have 2 choices:
set the audit pid to 0 and then close the socket, or just close the socket.
If I just close the socket, I get this message in the logs:
Apr 11 16:55:04 localhost kernel: audit: *NO* daemon at audit_pid=15734
This looks ugly. But if I set the pid to 0, we don't get that message in the
logs. But I am using the ack flag for positive confirmation of all netlink
communication. So what if the signal event is the first thing I read from the
socket instead of the ack? Meaning the event was delivered just after the
timeout and before the logging thread finished?
Besides, by using a timeout, we do not meet the requirements. If the timeout
occurs and we go ahead and shutdown, we simply don't have the information
about who initiated the shutdown.
I can come up with more scenarios that show we can't meet the CAPP
requirements by having an event placed into the message queue. The only way
to guarantee that we meet requirements is for the credentials to be available
*with* the signal delivery.
ALTERNATIVES
What I believe we should do is one of 2 things. Either create a SA_AUDITINFO
structure that can be delivered with the signal - or to swap the values of 2
entries in the siginfo_t structure. Between the two, I think SA_AUDITINFO is
the correct way to do it. But I would like to examine swapping values first.
We need to think about LSPP as we do this and solve both problems while we are
in this area. LSPP will require that we log the credentials of the initiator.
This would be the SE Linux sid. It is kept in kernel as a u32 data type. The
user id is kept as uid_t. So, we need to find 2 elements in the siginfo_t
structure that we can replace with our data.
The si_uid fits the loginuid perfectly. The si_uid normally indicates the user
that sent the signal. Since the audit daemon runs as root, only root
processes can send signals to it. So basically, every time we get a signal,
this element will be root which is meaningless. We can replace it with the
loginuid and now it has meaning.
The SE Linux uid is tougher to fit. Because linux is deployed on 16 bit
platforms, we cannot use any int in the siginfo_t structure and be correct.
We have to find something that is a long. In include/asm-generic/siginfo.h,
we can see the structure. A quick grep for long finds this:
#ifndef __ARCH_SI_BAND_T
#define __ARCH_SI_BAND_T long
#endif
We do not use poll in the audit daemon, so this might be a good candidate.
Another candidate would be anything with clock_t. Looking at the per arch
definition, they all seem to be long. So this means si_stime or si_utime
have the right sizes.
The only issue left is choosing which one we want to use and agreeing on that.
Since long is signed and the SE Linux sid is u32, we need to take care to
load it correctly so we don't get sign extension. It needs to be cast to
unsigned long and then long.
The other way of delivering credential with the signal is to create a new
SA_AUDITINFO flag and a new structure to hold our information:
typedef struct sigauditinfo {
int sa_signo; /* Signal number */
int sa_errno; /* An errno value */
int sa_code; /* Signal code */
pid_t sa_pid; /* Sending process ID */
sid sa_pidsid; /* Sending process sid */
uid_t sa_uid; /* Real user ID of sending process */
sid sa_uidsid; /* Real user's sid */
uid_t sa_luid; /* Login user ID of sending process */
int si_status; /* Exit value or signal */
} sigauditinfo_t;
This structure could be added to a union to ensure that it is the same size as
siginfo_t. This will keep the stack unwinders happy. The above structure
could be expanded to also include:
clock_t si_utime; /* User time consumed */
clock_t si_stime; /* System time consumed */
sigval_t si_value; /* Signal value */
int si_int; /* POSIX.1b signal */
void * si_ptr; /* POSIX.1b signal */
void * si_addr; /* Memory location which caused fault */
int si_band; /* Band event */
int si_fd; /* File descriptor */
But if we do that, we are too big to be in a union without increasing the
overall size. We could overcome this problem by using si_addr to point to a
new structure whenever there's no address fault. That address would be valid
only until the signal handler returns or is longjmp'ed out of.
NEXT STEP
The next step is to decide which way is cleanest and acceptable by upstream
developers. Are there holes in either way proposed above? Can sending an
shutdown audit event via netlink be done without races?
-Steve