On Wednesday, February 10, 2021 2:11:55 PM EST LC Bruzenak wrote:
On Wed, Feb 10, 2021 at 1:07 PM LC Bruzenak
<lenny(a)magitekltd.com> wrote:
> On Mon, Feb 8, 2021 at 7:44 PM Steve Grubb <sgrubb(a)redhat.com> wrote:
>> Hello,
>>
>> I have recently checked in to the audit tree 2 experimental plugins. You
>> can
>> enable them by passing --enable-experimental to configure. One of the
>> new
>> plugins is aimed at providing audit metrics to a statsd server. The idea
>> being that you can use this to relay the metrics to influxdb, prometheus
>> or
>> some other collector. Then you can use Grafana to visualize and alert.
>>
>> Currently, it supports the following metrics:
>>
>> kernel.audit.lost
>> kernel.audit.backlog
>> auditd.free_space
>> auditd.plugin_current_depth
>> auditd.plugin_max_depth
>> audit_events.total_count
>> audit_events.total_failed
>> audit_events.avc_count
>> audit_events.fanotify_count
>> audit_events.logins_failed
>> audit_events.logins_success
>> audit_events.anomaly_count
>> audit_events.response_count
>>
>> I'd be interested in hearing if this would be useful. And if these are
>> the
>> right metrics that people are interested in. Should something else be
>> measured? Should an example Grafana dashboard be included?
>>
>> Let me know what you think.
>>
>> -Steve
>
> Steve,
>
> I think this could be awesome; hoping to give it a try soon. An example
> dashboard would be very helpful if you could include that.
> The stats you already point out a good start.
>
> I'd also like to have a way to parse the per-machine kernel-assigned
> event IDs for missing ones. Might that need a separate plugin for that or
> could something be done within this setup?
This is not tracking event IDs. I don't think that fits with performance
metrics. To do this, you'd need to keep track of all events coming in and
some way of determining what's missing. Which means keeping event state
around until some timeout just in case a straggler comes through late.
> I'm pretty sure there are more metrics that would be desired
as well as
> some derived; e.g. take a per-user login/logoff set to identify time
> spent on a particular machine (screenlocks notwithstanding, but maybe
> eventually).
I was hoping to hear from people that might currently be using Grafana or
Graphite to hear if there is anything else needed. Do we need to namespace
the machines? If so, how is the best way based on experience? Is dot notation
better or underscores?
As for session time, I wonder if that kind of metric is currently provided by
other parts of statsd/telegraf?
> Or perhaps if clients send events+heartbeats, when are they
> up/down? These are some of the questions I've heard from security
> overseers.
I suppose it would be easy enough to check the audisp-remote state report for
it's information.
> And while some of these may not be inspected directly by the end
users,
> in the case of trouble calls or questions they might be the exact thing
> I'd ask them to relay to me in order to diagnose a problem or answer a
> question remotely.
That's the idea with system metrics...to see the system getting in trouble in
realtime before the user calls. There are other system metrics that can be
configured into statsd/telegraph and standard dashboards for Linux Server
metrics. How this differs is that this is statistics specifically aimed at the
audit daemon.
... and I forgot to ask - can you include a README there which
specifies
the minimum kernel/userspace level of code required?
There is no minimal kernel. It does need an audit-3.0 daemon in order to dump
internal state. However, if it doesn't find the state report, then it simply
doesn't update those counters. So, in that respect, you could transplant it
to pretty much any audit daemon.
-Steve