On Tuesday 29 January 2008 5:56:36 pm John Dennis wrote:
The format of audit messages from the kernel is a mess. The
bottom line is one cannot parse the audit messages without special
case knowledge of each audit message because the data formatting does
not follow any regular rules.
I don't know how it got this way, but it really needs to be fixed.
I agree.
Suggested Fix:
--------------
Most of these problems can easily be fixed if there is exactly one
central place to format an audit field value. The function
audit_log_vformat() could very easily ensure consistent formatting via
% format specifiers in the format string, e.g.:
audit_log_format("n=%d path=%s", n, path);
Building audit output piecewise would be deprecated, e.g. these types
of sequences would be eliminated.
audit_log_format(ab, " n=%d", n);
audit_log_format(ab, " name=");
audit_log_foo();
and replaced with:
audit_log_format(ab, " n=%d name=%s", foo_to_string(foo));
Whenever audit_log_vformat() encounters a % format specifier it
formats to a string, then it converts the string to an escaped quoted
string, and then inserts the escaped quoted string into the buffer
(e.g. n="123" name="foo bar\n" )
This way the formatting is consistent, easy to apply, and is never
special cased by the caller.
I'm afraid that your proposed solution is not enough as it still relies on the
caller for the overall message format. I'm a firm believer that if you have
strict requirements that need to be met (I think we do, your concerns are not
new) you need to enforce those requirements through the API, relying on
others to adhere to a "gentlemen's agreement" regarding the careful use of a
flexible API is always going to be problematic.
What I have in mind is an API similar to the following:
/* Audit field names */
enum {
AUD_NAME_UID,
AUD_NAME_GUID,
AUD_NAME_PATH,
... etc ...
}
/* Audit field value types */
enum {
AUD_TYPE_BOOL,
AUD_TYPE_U32,
AUD_TYPE_S32,
AUD_TYPE_U64,
AUD_TYPE_S64,
AUD_TYPE_STRING,
AUD_TYPE_STRING_RAW,
... etc ...
} audit_value_type;
/**
* audit_log_field - Record a name=value field in the audit message
* @buffer: Audit buffer/message
* @name: Audit message field name
* @type: Audit message field value type
* @length: Audit message field value length
* @value: Audit message field value
*
* Description:
* Record a name=value pair in the audit message specified by @buffer,
* the value in @value is automatically formatted according to @type.
*/
void audit_log_field(buffer, name, type, length, value);
So the following code snippet would record a UID and PATH fields:
audit_log_field(buffer, AUD_NAME_UID, sizeof(my_uid), my_uid);
audit_log_field(buffer, AUD_NAME_PATH, my_path_len, my_path);
The way I see it, there are three main advantages to this approach: first, the
caller must use a well defined field name, second, the caller is not
responsible for any of the message formatting, third, we free the API from
dealing with audit messages in a particular format (i.e. the message no
longer needs to be in a string representation, it could be in a binary
format ... or both). Done right, an API similar to the one above should
solve a good deal of the variability in the audit messages while at the same
time opening the door for future optimizations.
Auparse is not the answer:
--------------------------
Auparse is not the answer to irregular kernel audit message
formatting. First of all it forces auparse to have special case logic
which is not 100% robust and is tied to the kernel source code
version.
I also agree.
While we're at it:
------------------
If we do fix the format of audit messages we might as well fix some
other inconsistencies at the same time.
1) The initial part of AVC messages do not follow the standard
name=value formatting used everywhere else in audit.
I believe there are long standing issues here, revolving around SELinux's
desire to emit AVC messages regardless of the state of audit. I can't say
I've payed too much attention to this in the past but there should be plenty
of info in the archives (check the SELinux list too).
You might be better off dealing with the above API changes first then dealing
with this.
What has to change and what's optional:
---------------------------------------
The formatting of name/value pairs in the kernel must be fixed, it is
simply impossible to correctly parse in it's current state.
The rest of the suggested changes are syntactic sugar which would make
parsing easier because of regular syntax, but they are not
critical. We could retain the existing formats if backwards
compatibility is felt to trump syntactic cleanliness and ease in
parsing. It's a judgment call over when and how to introduce change
and the anticipated impact.
All reasons for why I think we need to remove as much of the formatting
decisions from the caller.
--
paul moore
linux security @ hp