Re: Kernel audit output is inconsistent, hard to parse

Wednesday, 30 January 2008

Steve Grubb wrote:
...
 On Tuesday 29 January 2008 17:56:36 John Dennis wrote: 
...
 Hence the audit parsing library. The idea is to abstract this away so
that 
 anyone wanting to write a tool does not need to study all the messages and 
 figure out the parsing rules. 
...
 The way forward has to be the audit parsing library. 
The problem is auparse is just as screwed as anybody else. Unparseable 
output is is just plain wrong and inexcusable. You're suggesting auparse 
embed all sorts of hacks and heuristics to unravel a problem which 
should never exist in the first place. It's a house of cards which in 
time will collapse. You also haven't explained how auparse is going to 
deal with log data generated by different kernel versions, especially 
when logs are aggregated.

...
 tools developed around these messages and making wholesale changes
will break 
 them. 
Break what is already fundamentally broken? That's not an answer ;-)

...
 Any fix will break someone's tool somewhere unless they are coded
to the audit 
 parsing library. 
auparse is going to break too. The current situation is you can't 
determine if a field is encoded or not by reading the output, you also 
have to know the kernel source code, that's wrong.

...
> Auparse is not the answer to irregular kernel audit message

...
 This is the answer in so many ways. In order to make any change, you
have to 
 decouple applications from the actual data structure. You cannot normalize 
 the data without breaking somebody somewhere.  
Which is why making the output so it can be parsed independent of the 
kernel version an essential requirement.

...
 For example, suppose we all agreed the data structure is an
abomination and 
 had to be fixed. We get all the code into 2.6.26 kernel. meanwhile Fedora 9 
 is released on the 2.6.24 kernel. We get the user space pieces fixed up to be 
 released at the same time as 2.6.26. Then Fedora steps up to 2.6.25 kernel 
 and then ultimately 2.6.26. The userspace in Fedora 9 was never intended to 
 work with the new format. We can't keep the kernel team from doing what's 
 right for everyone that wants new device drivers. We're stuck. 
You're only stuck if the output can only be parsed by one version, if 
the output were regular the problem goes away. Isn't that the desired 
result?

...
> auparse_get_field_str() returns the field value in it's
encoded form, 
...
 I would chose the words, raw form. 
Yes, raw is a better term. Some raw values are encoded, some aren't, 
that's the problem.

...
> this is almost never of value to the caller. The caller wants
the
> field value to be unencoded so it can operate on it.

 Sometimes. It depends on the situation. 
Very rarely. As an analogy 99.99% of the time you want your email client 
to decode the contents from the transfer encoding it arrived in, 
otherwise it's just gibberish. Raw form is really only useful when 
debugging the encoding/decoding.

...
> If you want the field value to be unencoded you have to call
> auparse_interpret_field().

 Correct. 
...
> But auparse_interpret_field() performs two distinctly different
operations,

 It does only one thing, that is translate the data from raw to interpreted 
 form. 
Wrong :-) It does two entirely different things and those operations 
cannot be separated. The two operations are:

1) decoding (e.g. decoding a field value encoded in hexadecimal form 
back into it's original string)

2) interpretation (e.g. translating a uid field into a username). I call 
this interpretation "contextual substitution" because it's taking a 
field value and substituting in another value, often in a different 
format. You cannot interpret a field value until it has been decoded.

What if I don't want auparse to change the field value and instead 
simply return the field value? Currently you can't simply get the field 
value! Why? Because some fields are encoded, so you either get the raw 
encoded value (which is meaningless 99.99% of the time, if it had been 
encoded) or you get something which is completely munged.

...
 So, John, if you want selinux format changes, complain on their mail
list. 
 I've already done that and lost. :) 
FWIW, I can live with not changing the message contents. But no one can 
live with a situation where the data can't be parsed, it is simply 
wrong. Just to be clear the problem is you can't determine as one parses 
if a field value is encoded or not which means you can't decide if it 
has to be decoded or not.

Here is an example from the real world, an audit message has this field

comm=df

So is the value the string "df" (e.g. disk free) or is this the 
hexadecimal encoded byte value 223? The only way to know is by looking 
at the kernel source code and knowing that the "comm" field in a 
specific audit record is generated by calling 
audit_log_untrustedstring(). What if it doesn't call that in an 
different kernel version? What if a new field is added in a new kernel 
version, how will the parser know what which function kernel used to 
generate the string? What if in one kernel version the string was output 
with audit_log_untrustedstring() but in another kernel version it wasn't?

-- 
John Dennis <jdennis(a)redhat.com&gt;

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: Kernel audit output is inconsistent, hard to parse