Re: Preferred subj= with multiple LSMs

Wednesday, 17 July 2019

On Wed, Jul 17, 2019 at 11:49 AM Casey Schaufler <casey(a)schaufler-ca.com&gt; wrote:
...
 On 7/17/2019 5:14 AM, Paul Moore wrote:
 > On Tue, Jul 16, 2019 at 7:47 PM Casey Schaufler <casey(a)schaufler-ca.com&gt;
wrote:
 >> On 7/16/2019 4:13 PM, Paul Moore wrote:
 >>> On Tue, Jul 16, 2019 at 6:18 PM Casey Schaufler
<casey(a)schaufler-ca.com&gt; wrote:
 >>>> It sounds as if some variant of the Hideous format:
 >>>>
 >>>>         subj=selinux='a:b:c:d',apparmor='z'
 >>>>         subj=selinux/a:b:c:d/apparmor/z
 >>>>         subj=(selinux)a:b:c:d/(apparmor)z
 >>>>
 >>>> would meet Steve's searchability requirements, but with significant
 >>>> parsing performance penalties.
 >>> I think "hideous format" sums it up nicely.  Whatever we choose
here
 >>> we are likely going to be stuck with for some time and I'm near to
 >>> 100% that multiplexing the labels onto a single field is going to be a
 >>> disaster.
 >> If the requirement is that subj= be searchable I don't see much of
 >> an alternative to a Hideous format. If we can get past that, and say
 >> that all subj_* have to be searchable we can avoid that set of issues.
 >> Instead of:
 >>
 >>         s = strstr(source, "subj=")
 >>         search_after_subj(s, ...);
 > This example does a lot of hand waving in search_after_subj(...)
 > regarding parsing the multiplexed LSM label.  Unless we restrict the
 > LSM label formats (which seems both wrong, and too late IMHO)

 I don't think it's too late, and I think it would be healthy
 to restrict LSM "contexts" to character sets that make command
 line specification possible. Embedded newlines? Ewwww. 
That would imply that the delimiter you would choose for the
multiplexed approach would be something odd (I think you suggested
0x02, or similar, earlier) which would likely require the multiplexed
subj field to become a hex encoded field which would be very
unfortunate in my opinion and would technically break with the current
subj/obj field format spec.  Picking a normal-ish delimiter, and
restricting its use by LSMs seems wrong to me.

It's also worth noting that if you were to move subj/obj to hex
encoded fields, in addition to causing a backwards compatibility
problem, you completely kill the ability to look at the raw log data
and make sense of the fields ... well, unless you can do the ascii hex
conversion in your head on the fly.

...
 >  we have
 > a parsing nightmare; can you write a safe multiplexed LSM label parser
 > without knowledge of each LSM label format?  Can you do that for each
 > LSM without knowing their loaded policy?  What happens when the policy
 > and/or label format changes?  What happens in a few years when another
 > LSM is added to the kernel?

 I was intentionally hand-wavy because of those very issues. 
Then you should already realize why this is a terrible idea ;)

...
 Steve says that parsing is limited to "strstr()", so
looking for
 ":s7:" in the subject should work just as well with a Hideous
 format as it does today, with the exception of false positives
 where LSMs have label string overlaps. 
Today when you go to search through your audit log you know that a
single LSM is providing subj labels, and you also know which LSM that
happens to be, so searching on a given string, or substring, is easy
and generally safe.  In a multiplexed approach this becomes much more
difficult, and depending on the search being done it could be
misleading, perhaps even dangerous with complicated searches that
exclude label substrings.

It's important to remember that Steve's strstr() comment only reflects
his set of userspace tools.  When you start talking about log
aggregation and analytics, it seems very likely that there are other
tools in use, likely with their own parsers that do much more
complicated searches than a simple strstr() call.

...
 Where is the need to use a module specific label parser coming
 from? Does the audit code parse SELinux contexts now? 
If you can't pick a "safe" delimiter that isn't included in any of the
LSM label formats, how else do you know how to parse the multiplexed
mess?

...
 >> we have
 >>
 >>         s = source
 >>         for (i = 0; i < lsm_slots ; i++) {
 >>                 s = strstr(s, "subj_")
 >>                 if (!s)
 >>                         break;
 >>                 s = search_after_subj_(s, lsm_slot_name[i], ...)
 > The hand waving here in search_after_subj_(...) is much less;
 > essentially you just match "subj_X" and then you can take the field
 > value as the LSM's label without having to know the format, the policy
 > loaded, etc.  It is both safer and doesn't require knowledge of the
 > LSMs (the LSM "name" can be specified as a parameter to the search
 > tool).

 You can do that with the Hideous format as well. I wouldn't
 say which would be easier without delving into the audit user
 space. 
No, you can't.  You still need to parse the multiplexed mess, that's
the problem.

-- 
paul moore
www.paul-moore.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: Preferred subj= with multiple LSMs