osquery loads a fairly minimal set of audit rules depending on its
configuration. In our case we only have a single rule for execve:
-a always,exit -S execve
I'll try finding that patch in the LTS kernel series upon which the Amazon
kernel is based and try upgrading past that. Thanks all.
On Fri, Aug 17, 2018 at 2:52 PM Paul Moore <paul(a)paul-moore.com> wrote:
On Fri, Aug 17, 2018 at 5:24 PM Preston Bennes
<preston.bennes(a)snowflake.net> wrote:
> Hey Paul,
>
> My expectations are aligned to what you've pointed out already
(not-vanilla old kernel, user space client other than auditd).
>
> Primarily what I'm looking for is a sanity check on the null pointer
deref on current vanilla kernel code because I think the rest of the issues
stem from that. My C isn't sharp enough to try my hand at a reproduction
case, though I would hope someone with the requisite expertise could put
one together relatively quickly. specifically this message:
>
>> audit: type=1305 audit(1534170894.852:117649234): auid=501 ses=102936
op="add_rule" key=(null) list=4 res=0
Unfortunately without digging into Amazon's source for that kernel
build it is difficult to come to any strong conclusion.
The audit message you are seeing above is an AUDIT_CONFIG_CHANGE (type
== 1305), and based on the 'op="add_rule"' field it appears as though
the system is adding an exit (list == 4) filter rule to the kernel.
The operation failed (res == 0), which shouldn't be surprising given
the kernel oops.
I'm not very familiar with osquery, does it load it's own audit filter
rules? Are you loading some of your own?
> On Fri, Aug 17, 2018 at 2:12 PM Paul Moore <paul(a)paul-moore.com> wrote:
>>
>> On Fri, Aug 17, 2018 at 4:26 PM Preston Bennes
>> <preston.bennes(a)snowflake.net> wrote:
>> > Hey Audit team,
>>
>> Hey Audit user!
>>
>> > I've got a kernel null pointer deref oops on Amazon Linux kernel
4.9.51-10.52.amzn1.x86_64. After the oops all new cron processes that
spawned were stuck in D wait on some audit related syscall. This exhausted
system file handles and the box ended up needing to be rebooted
out-of-bounds. The audit handle was held by
https://github.com/facebook/osquery/.
>> >
>> > It looks like there was an issue with audit sending to osquery
(netlink_unicast sending to pid 21233, error -111) followed by something
(presumably also osquery) attempting to 'op="add_rule" key=(null)'.
>>
>> The first thing I need to comment on is that since this is a distro
>> specific kernel (4.9.51-10.52.amzn1.x86_64) and not an upstream kernel
>> there is a limit in the amount of help we can provide. Generally the
>> upstream list focuses on the upstream code and we leave support for
>> the distro specific builds/packages up to the distro. If you can
>> reproduce your problem on an upstream kernel, with a strong preference
>> for a *recent* upstream kernel, we can probably be of more help. I
>> simply don't know what patches Amazon merges into it's amzn1 kernels.
>>
>> With that out of the way, I will add that v4.9.x is a rather old
>> kernel, and we've made a number of relevant improvements since the
>> v4.9 release and it is unlikely all of them have been backported to
>> your kernel. If possible, I would suggest upgrading to a newer kernel
>> as it may resolve your problem.
>>
>> There is also a possible issue of osquery as an auditd replacement;
>> it's unlikely this is being caused by osquery, but we don't test with
>> osquery to that save level as we do with Steve's audit userspace
>> (auditd).
>>
>> > Here's from /var/log/messages.
>> >
>> >> Aug 13 14:34:54 packer_default-10-180-21-24 kernel: [4749520.133904]
audit: netlink_unicast sending to audit_pid=21233 returned error: -111
>> >> Aug 13 14:34:54 packer_default-10-180-21-24 kernel: [4749520.137982]
audit_log_lost: 12 callbacks suppressed
>> >> Aug 13 14:34:54 packer_default-10-180-21-24 kernel: [4749520.137983]
audit: audit_lost=36081041 audit_rate_limit=8192 audit_backlog_limit=4096
>> >> Aug 13 14:34:54 packer_default-10-180-21-24 kernel: [4749520.137985]
audit: type=1305 audit(1534170894.852:117649229): audit_pid=21233 old=21233
auid=501 ses=102936 res=0
>> >> Aug 13 14:34:54 packer_default-10-180-21-24 kernel: [4749520.137986]
audit: type=1305 audit(1534170894.852:117649230): audit_enabled=1 old=1
auid=501 ses=102936 res=1
>> >> Aug 13 14:34:54 packer_default-10-180-21-24 kernel: [4749520.137987]
audit: type=1305 audit(1534170894.852:117649231): audit_backlog_wait_time=1
old=1 auid=501 ses=102936 res=1
>> >> Aug 13 14:34:54 packer_default-10-180-21-24 kernel: [4749520.137989]
audit: type=1305 audit(1534170894.852:117649232): audit_backlog_limit=4096
old=4096 auid=501 ses=102936 res=1
>> >> Aug 13 14:34:54 packer_default-10-180-21-24 kernel: [4749520.137990]
audit: type=1305 audit(1534170894.852:117649233): audit_failure=0 old=0
auid=501 ses=102936 res=1
>> >> Aug 13 14:34:54 packer_default-10-180-21-24 kernel: [4749520.137991]
audit: type=1305 audit(1534170894.852:117649234): auid=501 ses=102936
op="add_rule" key=(null) list=4 res=0
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.138007]
BUG: unable to handle kernel NULL pointer dereference at 00000000000001e0
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.143872]
IP: [<ffffffff814a76da>] netlink_unicast+0x4a/0x230
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.148336]
PGD 0
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.149822]
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.151230]
Oops: 0000 [#1] SMP
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.153736]
Modules linked in: iptable_filter(E) ip_tables(E) x_tables(E) udp_diag(E)
nfnetlink_queue(E) nfnetlink_log(E) nfnetlink(E) tcp_diag(E) inet_diag(E)
isofs(E) ipv6(E) crc_
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.184036]
CPU: 0 PID: 21372 Comm: osqueryd Tainted: G E
4.9.51-10.52.amzn1.x86_64 #1
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.190620]
Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.195513]
task: ffff8807fa1c3b00 task.stack: ffffc900079a4000
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.200179]
RIP: 0010:[<ffffffff814a76da>] [<ffffffff814a76da>]
netlink_unicast+0x4a/0x230
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.206822]
RSP: 0018:ffffc900079a7c38 EFLAGS: 00010246
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.211065]
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.216779]
RDX: 00001ffffffffffe RSI: 00000000024000c0 RDI: 0000000000000014
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.222473]
RBP: ffffc900079a7c68 R08: 0000000000000004 R09: ffff88062a375414
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.228167]
R10: ffff8808004032c0 R11: ffff88062a375400 R12: 0000000000000000
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.233738]
R13: ffff8804200a0100 R14: 00000000000052f1 R15: 0000000000016d42
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.238999]
FS: 00007f3d40cc4700(0000) GS:ffff880800a00000(0000) knlGS:0000000000000000
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.244796]
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.249555]
CR2: 00000000000001e0 CR3: 000000059e63d000 CR4: 00000000001406f0
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.255401]
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.260725]
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.266038]
Stack:
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.267680] 00000000079a7c58 00000000000052f1 00000000000052f1
00000000ffffffa1
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.273540] ffff88062a376800 ffff8804200a1d00 ffffc900079a7cf8
ffffffff811107d2
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.279379] ffffc90000000004 ffffffff811e2f85 ffffc900079a7ca8
ffffffff8145a08c
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.285201]
Call Trace:
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.287137] [<ffffffff811107d2>] audit_receive_msg+0x992/0xc20
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.291561] [<ffffffff811e2f85>] ?
__kmalloc_node_track_caller+0x35/0x260
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.296642] [<ffffffff8145a08c>] ? release_sock+0x8c/0xa0
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.300718] [<ffffffff81110ab2>] audit_receive+0x52/0xa0
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.304713] [<ffffffff814a77ef>] netlink_unicast+0x15f/0x230
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.308943] [<ffffffff814a7bdc>] netlink_sendmsg+0x31c/0x390
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.313236] [<ffffffff81455818>] sock_sendmsg+0x38/0x50
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.317219] [<ffffffff81455c4f>] SYSC_sendto+0xef/0x170
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.321142] [<ffffffff811156ab>] ? __audit_syscall_entry+0xeb/0x100
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.325853] [<ffffffff81003437>] ? syscall_trace_enter+0x1b7/0x290
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.330464] [<ffffffff811158b3>] ? __audit_syscall_exit+0x1f3/0x280
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.335189] [<ffffffff8145673e>] SyS_sendto+0xe/0x10
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.338952] [<ffffffff81003854>] do_syscall_64+0x54/0xc0
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.342995] [<ffffffff8153d32b>] entry_SYSCALL64_slow_path+0x25/0x25
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.347840]
Code: 65 8b 05 a2 5b b6 7e 25 00 ff 00 00 83 f8 01 19 f6 81 e6 a0 00 38 00
81 c6 20 00 08 02 e8 af cf ff ff 49 89 c5 31 c0 85 db 75 08 <49> 8b 84 24
e0 01 00 00 48 89 45
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.368625]
RIP [<ffffffff814a76da>] netlink_unicast+0x4a/0x230
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel:
[4749520.373253] RSP <ffffc900079a7c38>
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.376013]
CR2: 00000000000001e0
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.379307]
---[ end trace 6a41b2274729ba2a ]---
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.663029]
audit: type=1300 audit(1534170895.379:117649235): arch=c000003e syscall=59
success=yes exit=0 a0=271cce0 a1=2741dd0 a2=26dc3b0 a3=7ffdd7f1ea80 items=2
ppid=4556 pid=2407
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.682741]
audit: type=1309 audit(1534170895.379:117649235): argc=2 a0="date"
a1="+%s"
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 kernel: [4749520.688585]
audit: type=1307 audit(1534170895.379:117649235): cwd="/"
>> >> Aug 13 14:34:55 packer_default-10-180-21-24 abrt-dump-oops: Reported
1 kernel oopses to Abrt
>> >
>> > --
>> > Linux-audit mailing list
>> > Linux-audit(a)redhat.com
>> >
https://www.redhat.com/mailman/listinfo/linux-audit
>>
>> --
>> paul moore
>>
www.paul-moore.com
--
paul moore
www.paul-moore.com