Hi,
I'm still working on some bugs that I found over the weekend for libaudit. I
modified pam and passwd to log events to the audit netlink connection. As a
result, I ran into a problem. The problem is probably best illustrated by
showing in auditctl.c how to reproduce it.
If you open auditctl.c, look for reset_vars(). In that function is
audit_open(). Add a second call to audit_open so that it looks like this:
static int reset_vars(void)
{
list_requested = 0;
syscalladded = 0;
add = 0;
del = 0;
action = 0;
memset(&rule, 0, sizeof(rule));
audit_open(); // this is added. we don't care what the return is.
if ((fd = audit_open()) < 0) {
fprintf(stderr, "Cannot open netlink audit socket\n");
return 1;
}
return 0;
}
What this does is makes the application open 2 netlink connections to the
audit system. Compile it and try ./auditctl -s Using strace this is what I
get (with my annotations):
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK,
{rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
_sysctl({{CTL_KERN, KERN_VERSION}, 2, 0xbfec898c, 31, (nil), 0}) = 0
socket(PF_NETLINK, SOCK_RAW, 9) = 3
<- first open ->
bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
socket(PF_NETLINK, SOCK_RAW, 9) = 4
<- second open ->
bind(4, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
sendto(4, "\20\0\0\0\350\3\1\0gE\213k\0\0\0\0", 16, 0, {sa_family=AF_NETLINK,
pid=0, groups=00000000}, 12) = 16
<- send request , now get answer ->
recvfrom(4, 0xbfec7ed0, 1216, 64, 0xbfec7e60, 0xbfec7e5c) = -1 EAGAIN
(Resource temporarily unavailable) write(2, "Error receiving netlink packet
("..., 65Error receiving netlink packet (Resource temporarily unavailable)) =
65 write(2, "\n", 1
) = 1
<- error? ->
nanosleep({0, 100000000}, NULL) = 0
recvfrom(4, 0xbfec7ed0, 1216, 64, 0xbfec7e60, 0xbfec7e5c) = -1 EAGAIN
(Resource temporarily unavailable)
write(2, "Error receiving netlink packet ("..., 65Error receiving netlink
packet (Resource temporarily unavailable)) = 65
write(2, "\n", 1
<- error? ->
As you can see it scrolls messages because you get EAGAIN returned. This is a
real problem right now and I'm not sure how best to solve it short of making
a request, closing the descriptor, and re-open it for each communication to
the kernel.
What happens in real life is that passwd is going to log some data to the
audit system and opens a socket, then it collects the passwords, if
everything is OK, it passes the passwords to pam for authentication token
update. Pam decides that it needs to do some logging of its own and opens
descriptors to the audit system. They fail like above, EAGAIN.
Does any of you kernel hackers know why apps are limited to 1 netlink socket
connection? Can someone else verify the problem?
I think I can fix the problem by constantly closing and opening connections,
but that is ugly and not efficient. This "bug/feature" is holding up the
release of the next version of audit and patched trusted programs.
Thanks,
-Steve Grubb