Backwards-compatible string encoding

Friday, 27 March 2009

Hello all,

I have just run into the problem that many of you have: trying to parse 
the audit logs.

Yesterday I read through the linux-audit mail archive.  Here are the 
related topics I have found:
  https://www.redhat.com/archives/linux-audit/2006-March/msg00093.html
  https://www.redhat.com/archives/linux-audit/2006-March/msg00158.html
  https://www.redhat.com/archives/linux-audit/2007-November/msg00036.html
  https://www.redhat.com/archives/linux-audit/2008-January/msg00082.html
  https://www.redhat.com/archives/linux-audit/2008-March/msg00024.html
  https://www.redhat.com/archives/linux-audit/2008-May/msg00029.html
  https://www.redhat.com/archives/linux-audit/2008-June/msg00005.html
  https://www.redhat.com/archives/linux-audit/2008-August/msg00078.html
  https://www.redhat.com/archives/linux-audit/2009-March/msg00018.html

 From these I see these requirements (correct me if I am wrong):
- must be backwards-compatible (doesn't break user-space on FC2, etc)
- kernel does no verifying of incoming user-space strings
- kernel must output strings in a "simple" format (e.g. no XML :-)
- able to write a parser that guarantees all (relevant) input ends up in 
output
- use disk space efficiently
- handle UTF-8

Based on things other people have proposed, how does this sound:
- radix prefixes for any non-base10 number (I think audit mostly does 
this already?)
- hex-encode strings (and do not quote) if:
-- contains non-ASCII or non-printable characters
- quote strings if:
-- contains whitespace or '=' or '"' (in which case you have to
output 
something like '\"'
-- entirely {hex,octal,base10} characters

Or we could just save a little more headache at the cost of 
space/readability and hex-encode on '=' and '"' too.  Looking at 
auparse, we may have to hexencode with embedded '"'.

Check if you need to encode first, then check for quoting.  Something 
like...

// somewhere in kernel/audit.c ?
char *audit_log_sane_string(char *str, size_t slen) {

int quoteme = 0;
size_t i, numhex = 0;

for(i = 0; i < slen; i++) {
   if (!isprint(str[i])) return(hexencode(str));
   if (isspace(str[i]) || str[i] == '=' || str[i] == '"') quoteme =
1;
   if (isxdigit(str[i])) numhex++; // xdigit covers base8,10,16
}

if (quoteme || numhex == slen) return(quote(str));

return(strdup(str)); // kstrdup...?

}

Oh, and if anyone has ideas for making shadow-utils play nicer with 
audit, I possibly have that kind of time on my hands.  Also, getting rid 
of the extra punctuation [:(,)] would be great.

What do you all think?

Joshua Roys

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Backwards-compatible string encoding