On Monday, January 12, 2015 09:11:21 AM Imre Palik wrote:
On 01/08/15 22:53, Paul Moore wrote:
> On Tuesday, January 06, 2015 03:51:20 PM Imre Palik wrote:
>> From: "Palik, Imre" <imrep(a)amazon.de>
>>
>> When file auditing is enabled, during a low memory situation, a memory
>> allocation with __GFP_FS can lead to pruning the inode cache. Which can,
>> in turn lead to audit_tree_freeing_mark() being called. This can call
>> audit_schedule_prune(), that tries to fork a pruning thread, and
>> waits until the thread is created. But forking needs memory, and the
>> memory allocations there are done with __GFP_FS.
>>
>> So we are waiting merrily for some __GFP_FS memory allocations to
>> complete,
>> while holding some filesystem locks. This can take a while ...
>>
>> This patch creates a single thread for pruning the tree from
>> audit_add_tree_rule(), and thus avoids the deadlock that the on-demand
>> thread creation can cause.
>>
>> Reported-by: Matt Wilson <msw(a)amazon.com>
>> Cc: Matt Wilson <msw(a)amazon.com>
>> Signed-off-by: Imre Palik <imrep(a)amazon.de>
>
> Thanks for sticking with this and posting a revised patch, my comments are
> inline with the patch below ... also as a FYI, when sending a revised
> patch it is often helpful to put a revision indicator in the subject
> line, as an>
> example:
> "[RFC PATCH v2] audit: make audit less awful"
>
> It's not strictly necessary, but it makes my life just a little bit
> easier.
Sorry for that. I realised that I botched the subject immediately after
sending the mail :-(
No worries, you'll take care of it next time.
>> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
>> index 0caf1f8..0ada577 100644
>> --- a/kernel/audit_tree.c
>> +++ b/kernel/audit_tree.c
>
> ...
>
>> +static int launch_prune_thread(void)
>> +{
>> + prune_thread = kthread_create(prune_tree_thread, NULL,
>> + "audit_prune_tree");
>> + if (IS_ERR(prune_thread)) {
>> + audit_panic("cannot start thread audit_prune_tree");
>
> I'm not certain audit_panic() is warranted here, pr_err() might be a
> better choice. What is the harm if the thread doesn't start and we return
> an error code?
I thought disabling the bigger part of the file auditing would warrant a
bigger bang than pr_err(). If you think, it is an overkill, then I can
change it.
But see my comment below in audit_schedule_prune()
My concern with audit_panic() is that it only generates a panic() in the
*very* rare circumstance that someone has configured it that way. While there
are some users who do configure their systems with AUDIT_FAIL_PANIC, I think
it is safe to say that most do not. Further, audit_panic() can be rate
limited or even silenced in the case of AUDIT_FAIL_SILENT.
The choice of pr_err() is not perfect for all scenarios, but I think it is the
better choice for most of the common scenarios.
>> + prune_thread = NULL;
>> + return -ENOSYS;
>
> Out of curiosity, why ENOSYS?
I thought it is a bit less confusing than ESRCH.
Originally I was thinking about -ENOMEM, thoughts?
>> + } else {
>> + wake_up_process(prune_thread);
>> + return 0;
>> + }
>> +}
>
> See my comments below in audit_schedule_prune().
>
>> /* called with audit_filter_mutex */
>> int audit_add_tree_rule(struct audit_krule *rule)
>> {
>>
>> @@ -663,6 +713,12 @@ int audit_add_tree_rule(struct audit_krule *rule)
>>
>> /* do not set rule->tree yet */
>> mutex_unlock(&audit_filter_mutex);
>>
>> + if (unlikely(!prune_thread)) {
>> + err = launch_prune_thread();
>> + if (err)
>> + goto Err;
>> + }
>> +
>
> Why not put this at the top of audit_add_tree_rule()?
I would like to do this without holding audit_filter_mutex.
Sorry, I forgot that audit_add_tree_rule() is called with the
audit_filter_mutex locked.
>> err = kern_path(tree->pathname, 0, &path);
>> if (err)
>>
>> goto Err;
>>
>> @@ -713,6 +769,9 @@ int audit_tag_tree(char *old, char *new)
>>
>> struct vfsmount *tagged;
>> int err;
>>
>> + if (!prune_thread)
>> + return -ENOSYS;
>
> Help me out - why?
Still, why?
>> err = kern_path(new, 0, &path2);
>> if (err)
>>
>> return err;
>>
>> @@ -800,36 +859,11 @@ int audit_tag_tree(char *old, char *new)
>>
>> return failed;
>>
>> }
>>
>> -/*
>> - * That gets run when evict_chunk() ends up needing to kill audit_tree.
>> - * Runs from a separate thread.
>> - */
>> -static int prune_tree_thread(void *unused)
>> -{
>> - mutex_lock(&audit_cmd_mutex);
>> - mutex_lock(&audit_filter_mutex);
>> -
>> - while (!list_empty(&prune_list)) {
>> - struct audit_tree *victim;
>> -
>> - victim = list_entry(prune_list.next, struct audit_tree, list);
>> - list_del_init(&victim->list);
>> -
>> - mutex_unlock(&audit_filter_mutex);
>> -
>> - prune_one(victim);
>> -
>> - mutex_lock(&audit_filter_mutex);
>> - }
>> -
>> - mutex_unlock(&audit_filter_mutex);
>> - mutex_unlock(&audit_cmd_mutex);
>> - return 0;
>> -}
>>
>> static void audit_schedule_prune(void)
>> {
>>
>> - kthread_run(prune_tree_thread, NULL, "audit_prune_tree");
>> + BUG_ON(!prune_thread);
>> + wake_up_process(prune_thread);
>>
>> }
>
> First, I probably wasn't clear last time so I'll be more clear now: no
> BUG_ON() here, handle the error.
As far as I see, I disabled all the codepaths that can reach this point with
!prune_thread. So I thought leaving the BUG_ON() here doesn't really
matter.
If it doesn't do anything, then you can remove it ;)
BUG_ON()/BUG() does have its uses, but I'm not sure this in one of those
cases.
> Second, and closely related to the last sentence, perhaps the
right
> approach is to merge the launch_prune_thread() code with
> audit_schedule_prune() such that we only have one function which starts
> the thread if it isn't present, and wakes it up if it is, something like
> the following:
>
> static int audit_schedule_prune(void)
> {
>
> if (!prune_thread) {
>
> prune_thread = kthread_create(...);
> if (IS_ERR(prune_thread)) {
>
> pr_err("cannot start thread audit_prune_tree");
> prune_thread = NULL;
> return -ENOSYS;
>
> }
>
> }
>
> wake_up_process(prune_thread);
> return 0;
>
> }
if we do the thread creation in audit_schedule_prune, we won't necessarily
need the dedicated thread. This would be the alternative approach I
mentioned in the comment part of my original mail. Sorry if I was not
clear enough.
The code in the snippet I provided starts the thread if it doesn't exist, and
wakes the thread if it exists. I don't understand how that is different from
what you were doing, I just happen to think it is a little more robust.
Basically I see the following approaches:
1) dedicated thread created on syscall entry, with disabling file auditing
as long as the thread cannot be created.
2) on-demand thread-creation/destruction with some serialisation to ensure
that we won't create too many threads.
3) dedicated thread created on syscall entry, with fallback to thread
creation at cleanup time, if original thread creation fails.
Am I right that you would like to see the third one?
I don't want #2, and I think in general we should do whatever we can to
recover from errors such that we don't disable auditing. That is just good
practice.
>> /*
>>
>> @@ -896,9 +930,10 @@ static void evict_chunk(struct audit_chunk *chunk)
>>
>> for (n = 0; n < chunk->count; n++)
>>
>> list_del_init(&chunk->owners[n].list);
>>
>> spin_unlock(&hash_lock);
>>
>> + mutex_unlock(&audit_filter_mutex);
>>
>> if (need_prune)
>>
>> audit_schedule_prune();
>>
>> - mutex_unlock(&audit_filter_mutex);
>> +
>>
>> }
>
> Remove that trailing empty vertical whitespace please. If you aren't
> using it already, you should look into scripts/checkpatch.pl to sanity
> check your patches before sending.
Could you point me to that whitespace? I cannot see it in the patch, and
scripts/checkpatch.pl was not complaining either.
In your patch it looks like there is a blank empty line at the end of
evict_chunk(); it appears like you are replacing the last mutex_unlock() with
blank line.
--
paul moore
www.paul-moore.com