Re: Preferred subj= with multiple LSMs

Wednesday, 17 July 2019

On 7/17/2019 9:23 AM, Paul Moore wrote:
...
 On Wed, Jul 17, 2019 at 11:49 AM Casey Schaufler
<casey(a)schaufler-ca.com&gt; wrote:
> On 7/17/2019 5:14 AM, Paul Moore wrote:
>> On Tue, Jul 16, 2019 at 7:47 PM Casey Schaufler <casey(a)schaufler-ca.com&gt;
wrote:
>>> On 7/16/2019 4:13 PM, Paul Moore wrote:
>>>> On Tue, Jul 16, 2019 at 6:18 PM Casey Schaufler
<casey(a)schaufler-ca.com&gt; wrote:
>>>>> It sounds as if some variant of the Hideous format:
>>>>>
>>>>>         subj=selinux='a:b:c:d',apparmor='z'
>>>>>         subj=selinux/a:b:c:d/apparmor/z
>>>>>         subj=(selinux)a:b:c:d/(apparmor)z
>>>>>
>>>>> would meet Steve's searchability requirements, but with
significant
>>>>> parsing performance penalties.
>>>> I think "hideous format" sums it up nicely.  Whatever we choose
here
>>>> we are likely going to be stuck with for some time and I'm near to
>>>> 100% that multiplexing the labels onto a single field is going to be a
>>>> disaster.
>>> If the requirement is that subj= be searchable I don't see much of
>>> an alternative to a Hideous format. If we can get past that, and say
>>> that all subj_* have to be searchable we can avoid that set of issues.
>>> Instead of:
>>>
>>>         s = strstr(source, "subj=")
>>>         search_after_subj(s, ...);
>> This example does a lot of hand waving in search_after_subj(...)
>> regarding parsing the multiplexed LSM label.  Unless we restrict the
>> LSM label formats (which seems both wrong, and too late IMHO)
> I don't think it's too late, and I think it would be healthy
> to restrict LSM "contexts" to character sets that make command
> line specification possible. Embedded newlines? Ewwww.
 That would imply that the delimiter you would choose for the
 multiplexed approach would be something odd (I think you suggested
 0x02, or similar, earlier) which would likely require the multiplexed
 subj field to become a hex encoded field which would be very
 unfortunate in my opinion and would technically break with the current
 subj/obj field format spec.  Picking a normal-ish delimiter, and
 restricting its use by LSMs seems wrong to me. 
Just say "no" to hex encoding! BTW, keys are not hex encoded.

We've never had to think about having general rules on
what security modules do before, because with only one
active each could do whatever it wanted without fear of
conflict. If there is already a character that none of
the existing modules use, how would it be wrong to
reserve it?

Smack disallows the four characters '"/\ because quoting
is too important to ignore and the likelyhood that someone
would confuse labels with paths seemed great. I sniffed
around a little, but couldn't find the sets for SELinux or
AppArmor.

...
 It's also worth noting that if you were to move subj/obj to hex
 encoded fields, in addition to causing a backwards compatibility
 problem, you completely kill the ability to look at the raw log data
 and make sense of the fields ... well, unless you can do the ascii hex
 conversion in your head on the fly. 
Agreed, even though there was a time when I could do
hex decoding in both ASCII and EBCDIC on the fly.

...
>>  we have
>> a parsing nightmare; can you write a safe multiplexed LSM label parser
>> without knowledge of each LSM label format?  Can you do that for each
>> LSM without knowing their loaded policy?  What happens when the policy
>> and/or label format changes?  What happens in a few years when another
>> LSM is added to the kernel?
> I was intentionally hand-wavy because of those very issues.
 Then you should already realize why this is a terrible idea ;) 
Unfortunately, I'm facing two options, one of which the
kernel maintainer thinks is a bad idea and the other the
user space maintainer thinks is a bad idea. Plus, I'm not
very happy with either, either.

...
> Steve says that parsing is limited to "strstr()", so
looking for
> ":s7:" in the subject should work just as well with a Hideous
> format as it does today, with the exception of false positives
> where LSMs have label string overlaps.
 Today when you go to search through your audit log you know that a
 single LSM is providing subj labels, and you also know which LSM that
 happens to be, so searching on a given string, or substring, is easy
 and generally safe.  In a multiplexed approach this becomes much more
 difficult, and depending on the search being done it could be
 misleading, perhaps even dangerous with complicated searches that
 exclude label substrings. 
I'm aware of this issue, which is one of the reasons I'm
asking about the preferred approach.

...
 It's important to remember that Steve's strstr() comment only
reflects
 his set of userspace tools.  When you start talking about log
 aggregation and analytics, it seems very likely that there are other
 tools in use, likely with their own parsers that do much more
 complicated searches than a simple strstr() call. 
Point. But long term, they'll have to be updated to accommodate
whatever we decide on. Which makes the "simple" case, where one
security module is in use all the more important.

...
> Where is the need to use a module specific label parser coming
> from? Does the audit code parse SELinux contexts now?
 If you can't pick a "safe" delimiter that isn't included in any of the
 LSM label formats, how else do you know how to parse the multiplexed
 mess? 
Ah, but if we can ...

...
>>> we have
>>>
>>>         s = source
>>>         for (i = 0; i < lsm_slots ; i++) {
>>>                 s = strstr(s, "subj_")
>>>                 if (!s)
>>>                         break;
>>>                 s = search_after_subj_(s, lsm_slot_name[i], ...)
>> The hand waving here in search_after_subj_(...) is much less;
>> essentially you just match "subj_X" and then you can take the field
>> value as the LSM's label without having to know the format, the policy
>> loaded, etc.  It is both safer and doesn't require knowledge of the
>> LSMs (the LSM "name" can be specified as a parameter to the search
>> tool).
> You can do that with the Hideous format as well. I wouldn't
> say which would be easier without delving into the audit user
> space.
 No, you can't.  You still need to parse the multiplexed mess, that's
 the problem. 
You move the parsing problem to the record, where you have to
look for subj_selinux= instead of having the parsing problem in
the subj= field, where you look for something like selinux=
within the field. Neither looks like the work of an afternoon to
get right.

It probably looks like I'm arguing for the Hideous format option.
That would require less work and code disruption, so it is tempting
to push for it. But I would have to know the user space side a
whole lot better than I do to feel good about pushing anything that
isn't obviously a good choice. I kind of prefer Paul's "subj=?"
approach, but as it's harder, I don't want to spend too much time
on it if it gets me a big, juicy, well deserved NAK.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: Preferred subj= with multiple LSMs