Re: problems with slow auditing/journaling after updating to RHEL 6.8

Friday, 10 June 2016

On Friday, June 10, 2016 04:43:19 PM Jacobson, Robert C. [HONEYWELL TECHNOLOGY 
SOLUTIONS INC] wrote:
...
 I guess the first question I should ask is :  is this the proper list
for
 questions about problems?  If not, then please accept my apology -- and I
 would greatly appreciate if you could direct me to the proper list. 
Yes, this is the correct list.

...
 I recently upgraded a system to RHEL 6.8.
 kernel: 2.6.32-642.1.1.el6.x86_64
 audit:  audit-2.4.5-3.el6.x86_64

 My users and I immediately noticed a huge drop in application performance --
 and when I investigated further, I found a large amount of CPU wait. 
Look under audit

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/...

There was a bug fixed where the guarantee about writing to disk was not being 
met. There were two choices, fix the behavior so that you get guaranteed writes 
to disk or leave it alone and hope no one ever notices.

This led me to do a performance study during the Christmas break and I posted 
the results here:

https://www.redhat.com/archives/linux-audit/2015-December/msg00061.html

What I would recommend is changing flush to INCREMENTAL and freq to a number on 
that table that you think is the target performance you need. 100 is a good 
all around number.

The result of this performance study then led to a redesign of the logging 
code released in 2.5. The 2.5 logging is over 90 times faster except when 
using SYNC or DATA settings. Nothing can really help that.

-Steve

...
 Here's a graph from nagios data via splunk, over the past 48
hours.  The
 system was patched ~1300 (right about the middle of each graph -- the first
 spike), then rebooted at 1715 (the second spike):
 http://i.imgur.com/BuyazLH.jpg

 Here's some nmon snapshot during a "problem" period:
 http://i.imgur.com/YbQjeK2.jpg

 (note:  all logical volumes on sda5, dm-9 is the logical volume for audit
 logs)

 If I'm interpreting things correctly, it is due to drastic slow-down of
 journaling on my audit volume.  I don't know if this is a problem with the
 auditing subsystem, or the disk I/O subsystem, or ... what?

 Here's an excerpt from iotop, while I run a process that generates perhaps
 1000 audit log messages:

 Total DISK READ :       0.00 B/s | Total DISK WRITE :     160.53 K/s
 Actual DISK READ:       0.00 B/s | Actual DISK WRITE:     586.11 K/s
   PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  1458 be/3 root        0.00 B/s    0.00 B/s  0.00 % 97.23 % [jbd2/dm-9-8]
 24354 be/3 root        0.00 B/s  145.59 K/s  0.00 %  0.54 % auditd

 The total amount of data generated in the logs is actually very small --
 approximately 1 MB -- so I'm puzzled as to why it would take so long to
 write it out.

 The process that generates most of these audit messages takes 1.1 second to
 run with auditing off (or on a RHEL 6.7 system with auditing on or off).
 With auditing running, it takes 32 seconds to run.

 ## auditing is off:
 $ time nice -n 10 /product/bin/fclient.rb -d 3
 Logging in /tmp/ferret.log

 real    0m1.113s
 user    0m0.435s
 sys     0m0.643s

 ## auditing is on:
 $ time nice -n 10 /product/bin/fclient.rb -d 3
 Logging in /tmp/ferret.log

 real    0m32.088s
 user    0m0.394s
 sys     0m0.686s

 I'll note that the ruby version hasn't changed between 6.7 and 6.8 (still
 ruby-1.8.7.374-4.el6_6.x86_64)

 The system is using CFQ for disk I/O.  The filesystem is ext4.  Disks are
 all local SATA disks.

 I'd appreciate any guidance you have!

 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Robert Jacobson               Robert.C.Jacobson(a)nasa.gov
 Lead System Admin       Solar Dynamics Observatory (SDO)
 Bldg 14, E222                             (301) 286-1591

 --
 Linux-audit mailing list
 Linux-audit(a)redhat.com
 https://www.redhat.com/mailman/listinfo/linux-audit 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: problems with slow auditing/journaling after updating to RHEL 6.8