On Mon, 2013-04-29 at 12:48 -0400, Steve Grubb wrote:
On Monday, April 29, 2013 11:15:16 AM Luciano Chavez wrote:
> Hello,
>
> I am working with a development team developing a J2EE application. They
> reported a problem with a crash in audit_send(). The crash occurred in a
> ppc64 architecture environment early on in the invocation to audit send.
>
> The crash occurs in this instruction which is establishing the size of
> the local stack:
>
> => 0xfff73237994 <audit_send+52>: stdu r1,-27232(r1)
>
> I found one large struct defined to a local variable
>
> (gdb) print sizeof(struct audit_message)
> $4 = 8988
>
> but you will note that it asks for much more than that and after looking
> at the audit_send() routine, it calls a function called check_ack()
At the time it calls check_ack, its done with that variable and the compiler
could have retired that area of the stack for reuse by something else.
> which appears to be inlined
It doesn't have the inline keyword.
Hi Steve,
Thanks for the quick reply. More than likely the compiler inlined the
check_ack() function as there is only one call to it from audit_send()
but even so, I am not sure that mattered.
> and it contains two even larger definitions
> on the stack for the following structure:
>
> struct audit_reply
>
> (gdb) print sizeof(struct audit_reply)
> $3 = 9016
They should be the same size. One is in an if statement that only executes on
a netlink message error. That should not be a normal case.
> So, the combination of the three is what requires almost 26.5K of local
> stack usage in this frame alone.
>
> Is there a requirement for libaudit to have the structs on the stack
> versus allocated from heap? Is so, is this requirement documented
> somewhere?
The audit daemon has to be reliable, It was designed to use as few malloc's as
possible so that nothing unexpected can happen, like memory fragmentation or
stopping because an allocation failed.
> To be fair, the Java application has some heavy stack usage as it is
> since it is deployed in a web application server and there is a JNI
> function that is somewhere in the call stack as well. However, the stack
> usage in the audit_send() function seems ... excessive.
Well, if you are within 26k of crashing due to being out of stack, then you
might just be shifting the problem to something else cause you're about out of
stack. :-) To me, it sounds a bit like the compiler could be more aggressive
on reclaiming the unused stack area once it sees stack variables expire.
> Originally the thread stacksize size was set to 256K and that did not
> help but once we raised it to 1MB it did but I think that is probably
> more than we really need.
>
> I have looked at the source for the audit 2.2.3 release from March and
> don't see a difference in how the structs are allocated. So once again,
> if there is not a requirement that the structs be on the stack, should
> they not be allocated off the heap?
Well, except that I am trying real hard to avoid malloc in all functions that
are executed by auditd. You'll find it in other places, like library functions
that are used by other apps or things that parse and analyze. They aren't as
critical as auditd staying running.
In our case the libaudit functions were invoked as a result of audit
logging after a pam_authenticate() function was called by the JNI code
in the context of java native thread. It wasn't within auditd but you
did confirm that placing these structs was intentional as part of the
design process and not something else. Thanks.
I'm open to ideas. I don't know if putting req.nlh.nlmsg_len
into a tmp
variable would help the compiler realise it can reclaim the area occupied by
req or not. I might be able to get rid of rep2 by reusing the area of rep
after coping out the important part of it. But this part of the code hasn't
significantly changed in 7-8 years.
-Steve
I saw your followup and thanks for freeing up some stack space. I saw
the diff in trunk. I'll see if the development team wants to try another
libaudit with this change to reduce there stacksize requirements a bit.
regards,
--
Luciano Chavez <lnx1138(a)linux.vnet.ibm.com>