On Monday, April 29, 2013 11:15:16 AM Luciano Chavez wrote:
Hello,
I am working with a development team developing a J2EE application. They
reported a problem with a crash in audit_send(). The crash occurred in a
ppc64 architecture environment early on in the invocation to audit send.
The crash occurs in this instruction which is establishing the size of
the local stack:
=> 0xfff73237994 <audit_send+52>: stdu r1,-27232(r1)
I found one large struct defined to a local variable
(gdb) print sizeof(struct audit_message)
$4 = 8988
but you will note that it asks for much more than that and after looking
at the audit_send() routine, it calls a function called check_ack()
At the time it calls check_ack, its done with that variable and the compiler
could have retired that area of the stack for reuse by something else.
which appears to be inlined
It doesn't have the inline keyword.
and it contains two even larger definitions
on the stack for the following structure:
struct audit_reply
(gdb) print sizeof(struct audit_reply)
$3 = 9016
They should be the same size. One is in an if statement that only executes on
a netlink message error. That should not be a normal case.
So, the combination of the three is what requires almost 26.5K of
local
stack usage in this frame alone.
Is there a requirement for libaudit to have the structs on the stack
versus allocated from heap? Is so, is this requirement documented
somewhere?
The audit daemon has to be reliable, It was designed to use as few malloc's as
possible so that nothing unexpected can happen, like memory fragmentation or
stopping because an allocation failed.
To be fair, the Java application has some heavy stack usage as it is
since it is deployed in a web application server and there is a JNI
function that is somewhere in the call stack as well. However, the stack
usage in the audit_send() function seems ... excessive.
Well, if you are within 26k of crashing due to being out of stack, then you
might just be shifting the problem to something else cause you're about out of
stack. :-) To me, it sounds a bit like the compiler could be more aggressive
on reclaiming the unused stack area once it sees stack variables expire.
Originally the thread stacksize size was set to 256K and that did
not
help but once we raised it to 1MB it did but I think that is probably
more than we really need.
I have looked at the source for the audit 2.2.3 release from March and
don't see a difference in how the structs are allocated. So once again,
if there is not a requirement that the structs be on the stack, should
they not be allocated off the heap?
Well, except that I am trying real hard to avoid malloc in all functions that
are executed by auditd. You'll find it in other places, like library functions
that are used by other apps or things that parse and analyze. They aren't as
critical as auditd staying running.
I'm open to ideas. I don't know if putting req.nlh.nlmsg_len into a tmp
variable would help the compiler realise it can reclaim the area occupied by
req or not. I might be able to get rid of rep2 by reusing the area of rep
after coping out the important part of it. But this part of the code hasn't
significantly changed in 7-8 years.
-Steve