Re: [RFC][PATCH] audit: add feature audit_lost reset

Friday, 9 December 2016

On Fri, Dec 9, 2016 at 2:00 AM, Richard Guy Briggs <rgb(a)redhat.com&gt; wrote:
...
 On 2016-12-08 09:05, Paul Moore wrote:
> On Wed, Dec 7, 2016 at 10:53 PM, Richard Guy Briggs <rgb(a)redhat.com&gt; wrote:
> > On 2016-12-07 18:45, Paul Moore wrote:
> >> On Wed, Dec 7, 2016 at 6:30 PM, Steve Grubb <sgrubb(a)redhat.com&gt;
wrote:
> >> > On Wednesday, December 7, 2016 6:10:49 PM EST Paul Moore wrote:
> >> >> On Wed, Dec 7, 2016 at 10:58 AM, Richard Guy Briggs
<rgb(a)redhat.com&gt; wrote:
> >> >> > On 2016-12-07 10:53, Steve Grubb wrote:
> >> >> >> On Wednesday, December 7, 2016 10:05:30 AM EST Paul Moore
wrote:
> >> >> >> > On Tue, Dec 6, 2016 at 10:32 PM, Richard Guy Briggs
<rgb(a)redhat.com&gt;
> >> > wrote:
> >> >> >> > > On 2016-12-06 19:17, Paul Moore wrote:
> >> >> >> > >> On Tue, Dec 6, 2016 at 12:13 AM, Richard Guy
Briggs <rgb(a)redhat.com&gt;
> >> >> >> > >> Okay, back up ... this whole mess about
atomic_xchg() was always
> >> >> >> > >> unrelated to my original suggestion,
let's focus on my original
> >> >> >> > >> comment ... don't reset the counter on a
AUDIT_GET, reset it on a
> >> >> >> > >> AUDIT_SET with an AUDIT_STATUS_LOST, does
that make sense?
> >> >> >> > >
> >> >> >> > > I understood that.  It sounds like a nice simple
and straightforward
> >> >> >> > > method to do it but for the question of
accuracy.  Please rewind to
> >> >> >> > > my
> >> >> >> > > fundamental point: How do we get an accurate
reading of the last
> >> >> >> > > value
> >> >> >> > > of audit_lost before resetting it?
> >> >> >> >
> >> >> >> > Okay, I thought you were worried about a different
race, which is why
> >> >> >> > this discussion wasn't making much sense to me. 
I understand your
> >> >> >> > point, but I really dislike the API; although
that's not your fault,
> >> >> >> > it's really the only way to do it via AUDIT_GET.
> >> >> >> >
> >> >> >> > I'd much prefer we go with the cleaner AUDIT_SET
approach and just not
> >> >> >> > worry about the small race window.  It would only be
an issue if you
> >> >> >> > reset the count under heavy audit load, and why would
you reset the
> >> >> >> > lost value if you were under a heavy audit load? 
That just doesn't
> >> >> >> > make sense.
> >> >> >> >
> >> >> >> > I suppose we should hear from Steve on this since he
was the one who
> >> >> >> > has been asking for this feature, although I'm
pretty sure I know what
> >> >> >> > he is going to say.
> >> >> >>
> >> >> >> To start with, this request comes from users of the audit
system. I just
> >> >> >> passed along the request. The issue is that when you do
auditctl -s, you
> >> >> >> get the number of records lost. If you do it the next day,
you have to
> >> >> >> do math to see what the one day delta is. So, to make
reporting easy,
> >> >> >> they want it to be reset whenever they do audictl -s.
> >> >> >>
> >> >> >> You could also make a AUDIT_GET_RESET that gets the status
and resets the
> >> >> >> number atomically. Then I can add another commandline
option to auditctl
> >> >> >> that allows an admin to say also reset the counters. If
that command
> >> >> >> line option is passed, I call AUDIT_GET_RESET otherwise I
call
> >> >> >> AUDIT_GET. Thought?>
> >> >> > This would be slightly simpler in kernel implementation than
the method
> >> >> > I proposed and would work fine, off the top of my head.
> >> >>
> >> >> I'd prefer not to introduce another command message type for
something
> >> >> small like this.
> >> >>
> >> >> Steve, do you have any objection to the AUDIT_SET based approach?
> >> >
> >> > Either way, we'd need a feature flag so that I can tell if the
kernel supports
> >> > this or not.
> >>
> >> I think we are okay without a specific feature flag as sending a
> >> AUDIT_SET/reset on an old kernel will be harmless; it won't do
> >> anything, but it shouldn't return an error either.
> >
> > Ok, so userspace is still left wondering if it worked until the next
> > time it reads that value, and even then it can't be certain if that
> > value is the same or higher than it was when userspace thought it reset
> > that value.  At this point, an old kernel will not return an error and
> > simply ignore any new AUDIT_STATUS_* flag since each flag is treated
> > independently and extra flags are ignored and not blocked.  This seems
> > sloppy since we have two ways of fixing this uncertainty pretty easily.
>
> Comments like the above aren't helpful, they are just annoying.  The
> drawbacks to the AUDIT_SET/reset approach have already been discussed
> on this thread, if you want to do something constructive think about
> how you can resolve these limitations within the context of others
> comments/feedback.

 I'm sorry to offend.  That wasn't my intent.  I was trying to bring new
 information and observations into the discussion.  Is there anything
 factually wrong in my observations other than the opinion of it being
 sloppy? 
Your email wasn't offensive, maybe a little snarky at times, but this
is an open mailing list and developers are nothing if not occasionally
snarky (at least I know I am); that wasn't my objection.

My objection to that first paragraph was that you weren't bringing
anything new to this thread (in my opinion), you were simply
regurgitating the same arguments without any thought of how to solve
these concerns within the context of our ongoing discussion.  I tried
to point you in a more constructive direction with my last email, but
perhaps I did a poor job, let me try again ... Opposing viewpoints,
especially when technical matters are concerned, is a good thing, but
ultimately a decision needs to be made if we are to solve our problems
and move forward; it is nice if that decision is the result of
unanimous vote, but it is foolish to expect that, or delay a fix,
waiting for consensus.  When you present a possible solution, and the
solution is rejected, in whole or in part, look at the other proposed
solutions: do they resolve all your concerns?  If yes, wonderful, work
to embrace that solution.  If not, try to resolve your concerns within
the context of this other proposal; don't continue to push something
which has already been rejected.  The ultimate goal should be to solve
the problem, not to merge a specific solution; the "best" solution for
the problem is always the one that gets merged.

Hopefully this helps, because I don't know how else to explain this,
and to be quite frank, I'm growing tired of talking about this.

...
> I've already mentioned that I didn't like the
AUDIT_GET/reset approach
> because I thought the interface was bad.  As I'm sure you know, the
> audit kernel/userspace interface is a bit of a hot-button topic with
> me; I think it has a lot of problems and I'm very intent on not making
> it worse (in my opinion, I will admit that API design is not entirely
> objective).  Continuing to argue for a interface design that I've
> already expressed a dislike for is not likely to win me over to your
> side; regardless of the outcome you will end up frustrating both
> yourself and the maintainer, neither are good things.

 I've re-read the thread from the beginning.  I guess I must have missed
 what was the fundamental problem with the AUDIT_GET/reset method other
 than taste.  What don't you like about the API that precludes
 using/abusing it this way? 
A few things come to mind immediately: abusing GET this way
effectively makes it a "write" operation which can make access control
policy difficult, having to bracket the GET with a FEATURE operation
is ugly and potentially racy in its own way, complexity compared to
other solutions.

...
> You feel very strongly that
> the window is of grave concern, Steve and myself much less so.  If you
> still feel strongly about this, think about some different ways in
> which you can avoid losing a lost message counter bump.  Off the top
> of my head, there are really only two ways for the kernel's audit
> subsystem to send information back to userspace in this case, via a
> netlink return/error message or an audit record.  We could possibly do
> something with the netlink error message by returning the lost counter
> as a positive integer (negative integer is a failure code, zero is
> success), but that might get tricky in the future, although we could
> mitigate that risk by forcing the AUDIT_SET/reset to happen by itself
> (in other words, don't simply check to see if the bit is set in the
> bitmask, e.g. (s.mask & AUDIT_STATUS_LOST), check to see it is equal,
> e.g. (s.mask == AUDIT_STATUS_LOST)).

 This could work.  What risk do you see in doing it with other flags?
 That another set failure could usurp the return code?  If so, yes, I
 agree with requiring it to be a lone flag. 
Possible conflicts with other SET operations failing as well as
potential future use.  We can always relax the restriction in the
future, we can't make it more restrictive.

...
> There you go, two possible solutions for eliminating/mitigating
the
> potential race while sticking with the simpler AUDIT_SET/reset
> interface.  I suppose you could even implement both of the solutions
> above, they aren't mutually exclusive; that would depend on what
> Steve/userspace would prefer.  Finally, as for the feature bitmap to
> signal to userspace that we support this new feature: if you can't
> live without it, go ahead and add it in.  As I said before, I'm a
> little concerned at the rate we are consuming this bitmap, but I'll
> admit we still have plenty of room before we have to start worrying
> about alternatives.

 I would suggest that the return value (presuming it was reset when
 non-zero) or the audit record generated reporting the lost value
 reset would be sufficient confirmation that the feature exists on the
 running kernel and the addition to the feature bitmap is not strictly
 necessary, but you only find this out upon attempting that lost reset.

 Well, we haven't used much of that bitmap space and if it isn't to be
 used when needed, why is it there?  If there is a relatively simple
 alternate non-destructive way to discover the presence of a feature use
 of the bitmap isn't necessary. 
My concern isn't the absolute consumption of the bitmap, but rather
the rate of the consumption.

-- 
paul moore
www.paul-moore.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [RFC][PATCH] audit: add feature audit_lost reset