some logs about the issue // Re: Flush the hold queue fall into an infinite loop.

Thursday, 13 January 2022

Log as follows:

...
 [  257.972293] CPU: 79 PID: 550 Comm: kauditd Kdump: loaded Tainted:

 G           OE    --------- -t - 
 4.18.0-147.5.2.5.h781.eulerosv2r10.x86_64 #1
 [  257.972294] Hardware name: Huawei CH121 V5/IT11SPCA1, BIOS 7.93 
 01/14/2021
 [  257.972295] Call Trace:
 [  257.972297]  <IRQ>
 [  257.972307]  dump_stack+0x6f/0xab
 [  257.972314]  watchdog_timer_fn+0x222/0x2e0
 [  257.972316]  ? watchdog+0x50/0x50
 [  257.972322]  __hrtimer_run_queues+0x125/0x2f0
 [  257.972326]  ? recalibrate_cpu_khz+0x10/0x10
 [  257.972329]  hrtimer_interrupt+0xe5/0x240
 [  257.972331]  ? sched_clock+0x5/0x10
 [  257.972334]  smp_apic_timer_interrupt+0x6a/0x130
 [  257.972336]  apic_timer_interrupt+0xf/0x20
 [  257.972337]  </IRQ>
 [  257.972341] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
 [  257.972343] Code: ff ff 7f 5b 44 89 e8 5d 41 5c 41 5d c3 90 90 90 
 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 48 89 
 f7 57 9d <0f> 1f 44 00 00 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 
 c6 07
 [  257.972344] RSP: 0018:ffffb7d90e2d3e38 EFLAGS: 00000286 ORIG_RAX: 
 ffffffffffffff13
 [  257.972347] RAX: 0000000000000286 RBX: ffff9bb017d18b00 RCX: 
 ffff9bb017d19900
 [  257.972347] RDX: ffffffff8fb8fef0 RSI: 0000000000000286 RDI: 
 0000000000000286
 [  257.972348] RBP: ffffffff8fb8fef0 R08: 000000000002b3a0 R09: 
 ffffffff8e7829a2
 [  257.972349] R10: ffffd9126778fa00 R11: 00000000000f4240 R12: 
 ffffffff8fb8ff04
 [  257.972350] R13: 0000000000000000 R14: ffff9bb017d18bf4 R15: 
 ffff9bb017d18b00
 [  257.972356]  ? netlink_attachskb+0xb2/0x1d0
 [  257.972362]  skb_dequeue+0x57/0x70
 [  257.972367]  kauditd_send_queue+0x37/0x100
 [  257.972369]  ? kauditd_retry_skb+0x20/0x20
 [  257.972370]  ? kauditd_send_multicast_skb+0x90/0x90
 [  257.972372]  kauditd_thread+0xa5/0x230
 [  257.972377]  ? finish_wait+0x80/0x80
 [  257.972378]  ? auditd_reset+0x90/0x90
 [  257.972381]  kthread+0x10d/0x130
 [  257.972383]  ? kthread_flush_work_fn+0x10/0x10
 [  257.972385]  ret_from_fork+0x35/0x40
 [  269.972020] Sample cputime: 3999999736 ns(HZ: 1000)
 [  269.972022] Sample cpurate: 0 us, 3984966800 sy, 0 ni, 0 id, 0 wa, 
 15034536 hi, 0 si, 0 st
 [  269.972023] Sample softirq:
 [  269.972023] Sample hardirq:
 [  269.972232]         no hard irqs found.
 [  269.972233] watchdog: BUG: soft lockup - CPU#79 stuck for 22s! 
 [kauditd:550] 
Thanks.

在 2022/1/13 19:56, cuigaosheng 写道:
...
 When we add "audit=1" to the cmdline, kauditd will take up
100%
 cpu resource.As follows:

     configurations:
     	auditctl -b 64
     	auditctl --backlog_wait_time 60000
     	auditctl -r 0
     	auditctl -w /root/aaa  -p wrx
     shell scripts：
     	#!/bin/bash
     	i=0
     	while [ $i -le 66 ]
     	do
     	    touch /root/aaa
     	    let i++
     	done
     mandatory conditions:

         add "audit=1" to the cmdline, and kill -19 pid_number(for
/sbin/auditd).

   As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
   an infinite loop.

> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
>  714                               struct sk_buff_head *queue,
>  715                               unsigned int retry_limit,
>  716                               void (*skb_hook)(struct sk_buff *skb),
>  717                               void (*err_hook)(struct sk_buff *skb))
>  718 {
>  719         int rc = 0;
>  720         struct sk_buff *skb;
>  721         unsigned int failed = 0;
>  722
>  723         /* NOTE: kauditd_thread takes care of all our locking, 
> we just use
>  724          *       the netlink info passed to us (e.g. sk and 
> portid) */
>  725
>  726         while ((skb = skb_dequeue(queue))) {
>  727                 /* call the skb_hook for each skb we touch */
>  728                 if (skb_hook)
>  729                         (*skb_hook)(skb);
>  730
>  731                 /* can we send to anyone via unicast? */
>  732                 if (!sk) {
>  733                         if (err_hook)
>  734                                 (*err_hook)(skb);
>  735                         continue;
>  736                 }
>  737
>  738 retry:
>  739                 /* grab an extra skb reference in case of error */
>  740                 skb_get(skb);
>  741                 rc = netlink_unicast(sk, skb, portid, 0);
>  742                 if (rc < 0) {
>  743                         /* send failed - try a few times unless 
> fatal error */
>  744                         if (++failed >= retry_limit ||
>  745                             rc == -ECONNREFUSED || rc == -EPERM) {
>  746                                 sk = NULL;
>  747                                 if (err_hook)
>  748                                         (*err_hook)(skb);
>  749                                 if (rc == -EAGAIN)
>  750                                         rc = 0;
>  751                                 /* continue to drain the queue */
>  752                                 continue;
>  753                         } else
>  754                                 goto retry;
>  755                 } else {
>  756                         /* skb sent - drop the extra reference 
> and continue */
>  757                         consume_skb(skb);
>  758                         failed = 0;
>  759                 }
>  760         }
>  761
>  762         return (rc >= 0 ? 0 : rc);
>  763 }
 When kauditd attempt to flush the hold queue, the queue parameter is
&audit_hold_queue,
 and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so
err_hook(kauditd_rehold_skb)
 will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line
733) will
 fall into an infinite loop.
 I don't really understand the value of audit_hold_queue, can we remove it, or stop
droping the logs
 into kauditd_rehold_skb when the auditd is abnormal?

 Look forward your reply. Thank you very much.
 Gaosheng.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

some logs about the issue // Re: Flush the hold queue fall into an infinite loop.