On Mon, 2014-02-10 at 11:01 -0800, Andy Lutomirski wrote:
On Mon, Feb 10, 2014 at 9:29 AM, Andy Lutomirski
<luto(a)amacapital.net> wrote:
> On Mon, Feb 10, 2014 at 8:57 AM, Oleg Nesterov <oleg(a)redhat.com> wrote:
>> On 02/08, Andy Lutomirski wrote:
>>>
>>> +void audit_inc_n_rules()
>>> +{
>>> + struct task_struct *p, *t;
>>> +
>>> + read_lock(&tasklist_lock);
>>> + audit_n_rules++;
>>> + smp_wmb();
>>> + if (audit_n_rules == 1) {
>>> + /*
>>> + * We now have a rule; we need to hook syscall entry.
>>> + */
>>> + for_each_process_thread(p, t) {
>>> + if (t->audit_context)
>>> + set_tsk_thread_flag(t, TIF_SYSCALL_AUDIT);
>>> + }
>>> + }
>>> + read_unlock(&tasklist_lock);
>>> +}
>>> +
>>> +void audit_dec_n_rules()
>>> +{
>>> + read_lock(&tasklist_lock);
>>> + --audit_n_rules;
>>> + BUG_ON(audit_n_rules < 0);
>>> +
>>> + /*
>>> + * If audit_n_rules == 0, then __audit_syscall_exit will clear
>>> + * TIF_SYSCALL_AUDIT.
>>> + */
>>> +
>>> + read_unlock(&tasklist_lock);
>>> +}
>>
>> To be honest, I do not understand why _dec_ takes tasklist_lock...
>> And why _inc_ increments audit_n_rules under tasklist.
>
> Bah, incorrect leftover from last time.
>
>>
>>> @@ -1528,6 +1562,25 @@ void __audit_syscall_exit(int success, long
return_code)
>>> context->filterkey = NULL;
>>> }
>>> tsk->audit_context = context;
>>> +
>>> + if (ACCESS_ONCE(audit_n_rules) == 0) {
>>> + /*
>>> + * Either this is the very first syscall by this process or
>>> + * audit_dec_n_rules recently set audit_n_rules to zero.
>>> + */
>>> + smp_rmb();
>>
>> rmb() looks wrong, we need mb() to serialize ACCESS_ONCE() and
>> clear_tsk_thread_flag().
>
> I clearly need to review the rules. I think you're right, though --
> no barrier should be needed.
>
>>
>> But, otoh, I think we do not need any barrier at all, we can rely on
>> control dependency. See the recent 18c03c61444a21 "Documentation/
>> memory-barriers.txt: Prohibit speculative writes".
>>
>>> + /* audit_inc_n_rules could increment audit_n_rules here... */
>>> +
>>> + clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
>>> +
>>> + smp_rmb();
>>
>> Again, I guess this should be mb() or smp_mb__after_clear_bit().
>>
>>
>> And I still think this needs more changes. Once again, I do not think
>> that, say, __audit_log_bprm_fcaps() should populate context->aux if
>> !TIF_SYSCALL_AUDIT, this list can grow indefinitely. Or
__audit_signal_info()...
>>
>> Perhaps __audit_syscall_exit() should also set context->dummy?
>
> That would work.
>
> I'm still torn between trying to make it possible for things like
> __audit_log_bprm_fcaps to start a syscall audit record in the middle
> of a syscall or to just try to tighten up the current approach to the
> point where it will work correctly.
Personally, I'd say just hand the next syscall_entry(), don't try to get
that race closed that fast... "The first syscall after a rule is added
will be audited"
This is worse than I thought. Things like signal auditing can enter
the audit system from outside of a syscall.
Not sure what you mean here. The only place I know of that we do
something like that is signal delivery to auditd itself, which is a
horrible nasty ugly ungodly terrible eat-your-babies hack... We'd have
to special case that hack to not pay attention to TIF_SYSCALL_AUDIT or
audit_dummy_context(), but the rest might be fixable if we set/unset the
dummy spot...
I don't think there's
currently any way to tell whether you're in a syscall (when
TIF_SYSCALL_AUDIT is clear) so getting this to work right would
require arch help.
Don't understand why this is needed...
I'll ask what people on the Fedora list think about just changing
the
default to -t task,never.
--Andy