Re: [RFC][PATCH] audit: add feature audit_lost reset

Thursday, 8 December 2016

On Wed, Dec 7, 2016 at 10:53 PM, Richard Guy Briggs <rgb(a)redhat.com&gt; wrote:
...
 On 2016-12-07 18:45, Paul Moore wrote:
> On Wed, Dec 7, 2016 at 6:30 PM, Steve Grubb <sgrubb(a)redhat.com&gt; wrote:
> > On Wednesday, December 7, 2016 6:10:49 PM EST Paul Moore wrote:
> >> On Wed, Dec 7, 2016 at 10:58 AM, Richard Guy Briggs <rgb(a)redhat.com&gt;
wrote:
> >> > On 2016-12-07 10:53, Steve Grubb wrote:
> >> >> On Wednesday, December 7, 2016 10:05:30 AM EST Paul Moore wrote:
> >> >> > On Tue, Dec 6, 2016 at 10:32 PM, Richard Guy Briggs
<rgb(a)redhat.com&gt;
> > wrote:
> >> >> > > On 2016-12-06 19:17, Paul Moore wrote:
> >> >> > >> On Tue, Dec 6, 2016 at 12:13 AM, Richard Guy Briggs
<rgb(a)redhat.com&gt;
> >> >> > >> Okay, back up ... this whole mess about atomic_xchg()
was always
> >> >> > >> unrelated to my original suggestion, let's focus
on my original
> >> >> > >> comment ... don't reset the counter on a
AUDIT_GET, reset it on a
> >> >> > >> AUDIT_SET with an AUDIT_STATUS_LOST, does that make
sense?
> >> >> > >
> >> >> > > I understood that.  It sounds like a nice simple and
straightforward
> >> >> > > method to do it but for the question of accuracy.  Please
rewind to
> >> >> > > my
> >> >> > > fundamental point: How do we get an accurate reading of
the last
> >> >> > > value
> >> >> > > of audit_lost before resetting it?
> >> >> >
> >> >> > Okay, I thought you were worried about a different race, which
is why
> >> >> > this discussion wasn't making much sense to me.  I
understand your
> >> >> > point, but I really dislike the API; although that's not
your fault,
> >> >> > it's really the only way to do it via AUDIT_GET.
> >> >> >
> >> >> > I'd much prefer we go with the cleaner AUDIT_SET approach
and just not
> >> >> > worry about the small race window.  It would only be an issue
if you
> >> >> > reset the count under heavy audit load, and why would you
reset the
> >> >> > lost value if you were under a heavy audit load?  That just
doesn't
> >> >> > make sense.
> >> >> >
> >> >> > I suppose we should hear from Steve on this since he was the
one who
> >> >> > has been asking for this feature, although I'm pretty sure
I know what
> >> >> > he is going to say.
> >> >>
> >> >> To start with, this request comes from users of the audit system. I
just
> >> >> passed along the request. The issue is that when you do auditctl
-s, you
> >> >> get the number of records lost. If you do it the next day, you have
to
> >> >> do math to see what the one day delta is. So, to make reporting
easy,
> >> >> they want it to be reset whenever they do audictl -s.
> >> >>
> >> >> You could also make a AUDIT_GET_RESET that gets the status and
resets the
> >> >> number atomically. Then I can add another commandline option to
auditctl
> >> >> that allows an admin to say also reset the counters. If that
command
> >> >> line option is passed, I call AUDIT_GET_RESET otherwise I call
> >> >> AUDIT_GET. Thought?>
> >> > This would be slightly simpler in kernel implementation than the
method
> >> > I proposed and would work fine, off the top of my head.
> >>
> >> I'd prefer not to introduce another command message type for something
> >> small like this.
> >>
> >> Steve, do you have any objection to the AUDIT_SET based approach?
> >
> > Either way, we'd need a feature flag so that I can tell if the kernel
supports
> > this or not.
>
> I think we are okay without a specific feature flag as sending a
> AUDIT_SET/reset on an old kernel will be harmless; it won't do
> anything, but it shouldn't return an error either.

 Ok, so userspace is still left wondering if it worked until the next
 time it reads that value, and even then it can't be certain if that
 value is the same or higher than it was when userspace thought it reset
 that value.  At this point, an old kernel will not return an error and
 simply ignore any new AUDIT_STATUS_* flag since each flag is treated
 independently and extra flags are ignored and not blocked.  This seems
 sloppy since we have two ways of fixing this uncertainty pretty easily. 
Comments like the above aren't helpful, they are just annoying.  The
drawbacks to the AUDIT_SET/reset approach have already been discussed
on this thread, if you want to do something constructive think about
how you can resolve these limitations within the context of others
comments/feedback.

I've already mentioned that I didn't like the AUDIT_GET/reset approach
because I thought the interface was bad.  As I'm sure you know, the
audit kernel/userspace interface is a bit of a hot-button topic with
me; I think it has a lot of problems and I'm very intent on not making
it worse (in my opinion, I will admit that API design is not entirely
objective).  Continuing to argue for a interface design that I've
already expressed a dislike for is not likely to win me over to your
side; regardless of the outcome you will end up frustrating both
yourself and the maintainer, neither are good things.

We all agree that there is a potential race window with respect to
reading/resetting the lost counter, where we disagree is the
likelihood of that happening in practice.  You feel very strongly that
the window is of grave concern, Steve and myself much less so.  If you
still feel strongly about this, think about some different ways in
which you can avoid losing a lost message counter bump.  Off the top
of my head, there are really only two ways for the kernel's audit
subsystem to send information back to userspace in this case, via a
netlink return/error message or an audit record.  We could possibly do
something with the netlink error message by returning the lost counter
as a positive integer (negative integer is a failure code, zero is
success), but that might get tricky in the future, although we could
mitigate that risk by forcing the AUDIT_SET/reset to happen by itself
(in other words, don't simply check to see if the bit is set in the
bitmask, e.g. (s.mask & AUDIT_STATUS_LOST), check to see it is equal,
e.g. (s.mask == AUDIT_STATUS_LOST)).  We could also mitigate the race
via an audit record by emitting a record indicating that the lost
counter was reset and record the lost counter (before the reset) in
that record; honestly, now that I'm writing this, it seems like
something we should be doing regardless, as tampering with the lost
counter seems like a security relevant event.

There you go, two possible solutions for eliminating/mitigating the
potential race while sticking with the simpler AUDIT_SET/reset
interface.  I suppose you could even implement both of the solutions
above, they aren't mutually exclusive; that would depend on what
Steve/userspace would prefer.  Finally, as for the feature bitmap to
signal to userspace that we support this new feature: if you can't
live without it, go ahead and add it in.  As I said before, I'm a
little concerned at the rate we are consuming this bitmap, but I'll
admit we still have plenty of room before we have to start worrying
about alternatives.

-- 
paul moore
www.paul-moore.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [RFC][PATCH] audit: add feature audit_lost reset