Re: Kernel audit output is inconsistent, hard to parse

Wednesday, 30 January 2008

On Tuesday 29 January 2008 5:56:36 pm John Dennis wrote:
...
 The format of audit messages from the kernel is a mess. The
 bottom line is one cannot parse the audit messages without special
 case knowledge of each audit message because the data formatting does
 not follow any regular rules.

 I don't know how it got this way, but it really needs to be fixed. 
I agree.

...
 Suggested Fix:
 --------------

 Most of these problems can easily be fixed if there is exactly one
 central place to format an audit field value. The function
 audit_log_vformat() could very easily ensure consistent formatting via
 % format specifiers in the format string, e.g.:

      audit_log_format("n=%d path=%s", n, path);

 Building audit output piecewise would be deprecated, e.g. these types
 of sequences would be eliminated.

      audit_log_format(ab, " n=%d", n);
      audit_log_format(ab, " name=");
      audit_log_foo();

 and replaced with:

      audit_log_format(ab, " n=%d name=%s", foo_to_string(foo));

 Whenever audit_log_vformat() encounters a % format specifier it
 formats to a string, then it converts the string to an escaped quoted
 string, and then inserts the escaped quoted string into the buffer
 (e.g. n="123" name="foo bar\n" )

 This way the formatting is consistent, easy to apply, and is never
 special cased by the caller. 
I'm afraid that your proposed solution is not enough as it still relies on the 
caller for the overall message format.  I'm a firm believer that if you have 
strict requirements that need to be met (I think we do, your concerns are not 
new) you need to enforce those requirements through the API, relying on 
others to adhere to a "gentlemen's agreement" regarding the careful use of a

flexible API is always going to be problematic.

What I have in mind is an API similar to the following:

 /* Audit field names */
 enum {
   AUD_NAME_UID,
   AUD_NAME_GUID,
   AUD_NAME_PATH,
   ... etc ...
 }

 /* Audit field value types */
 enum {
   AUD_TYPE_BOOL,
   AUD_TYPE_U32,
   AUD_TYPE_S32,
   AUD_TYPE_U64,
   AUD_TYPE_S64,
   AUD_TYPE_STRING,
   AUD_TYPE_STRING_RAW,
   ... etc ...
 } audit_value_type;

 /**
  * audit_log_field - Record a name=value field in the audit message
  * @buffer: Audit buffer/message
  * @name: Audit message field name
  * @type: Audit message field value type
  * @length: Audit message field value length
  * @value: Audit message field value
  *
  * Description:
  * Record a name=value pair in the audit message specified by @buffer,
  * the value in @value is automatically formatted according to @type.
  */
 void audit_log_field(buffer, name, type, length, value);

So the following code snippet would record a UID and PATH fields:

 audit_log_field(buffer, AUD_NAME_UID, sizeof(my_uid), my_uid);
 audit_log_field(buffer, AUD_NAME_PATH, my_path_len, my_path);

The way I see it, there are three main advantages to this approach: first, the 
caller must use a well defined field name, second, the caller is not 
responsible for any of the message formatting, third, we free the API from 
dealing with audit messages in a particular format (i.e. the message no 
longer needs to be in a string representation, it could be in a binary 
format ... or both).  Done right, an API similar to the one above should 
solve a good deal of the variability in the audit messages while at the same 
time opening the door for future optimizations.

...
 Auparse is not the answer:
 --------------------------

 Auparse is not the answer to irregular kernel audit message
 formatting. First of all it forces auparse to have special case logic
 which is not 100% robust and is tied to the kernel source code
 version. 
I also agree.

...
 While we're at it:
 ------------------

 If we do fix the format of audit messages we might as well fix some
 other inconsistencies at the same time.

 1) The initial part of AVC messages do not follow the standard
     name=value formatting used everywhere else in audit. 
I believe there are long standing issues here, revolving around SELinux's 
desire to emit AVC messages regardless of the state of audit.  I can't say 
I've payed too much attention to this in the past but there should be plenty 
of info in the archives (check the SELinux list too).

You might be better off dealing with the above API changes first then dealing 
with this.

...
 What has to change and what's optional:
 ---------------------------------------

 The formatting of name/value pairs in the kernel must be fixed, it is
 simply impossible to correctly parse in it's current state.

 The rest of the suggested changes are syntactic sugar which would make
 parsing easier because of regular syntax, but they are not
 critical. We could retain the existing formats if backwards
 compatibility is felt to trump syntactic cleanliness and ease in
 parsing. It's a judgment call over when and how to introduce change
 and the anticipated impact. 
All reasons for why I think we need to remove as much of the formatting 
decisions from the caller.

-- 
paul moore
linux security @ hp

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: Kernel audit output is inconsistent, hard to parse