Audit container ID patchset v11 comments

Tuesday, 5 October 2021

Hi Richard and audit aficionados,

Now that the io_uring fixes are in linux-next and no one is screaming
too loudly about it, it's time to turn the focus back to the audit
container ID patchset.  Unfortunately I've lost Richard's v11 draft
from my inbox so I'm going to provide my feedback in this mail, broken
up into sections for each patch; hopefully it isn't too painful to
read ...

*** [PATCH ghak90 v11 01/11] audit: collect audit task parameters
- This would come out during the forward-port/rebase, but the io_uring
specific accessors are no longer needed.
- Minor stuff, but if audit_alloc() returns the tsk->audit pointer you
could simplify things a bit, see the the audit_alloc() case below.  I
also moved the loginuid/sessionid setting into audit_alloc() so we do
less work in the audit_set_loginuid() case.

  static audit_task_info *audit_alloc_task(struct task_struct *tsk)
  {
    if (!tsk->audit)
      tsk->audit = kmem_cache_alloc(audit_task_cache, GFP_KERNEL);
    return tsk->audit;
  }

  int audit_alloc(struct task_struct *tsk)
  {
    int rc;
    struct audit_task_info *ati;

    ati = audit_alloc_task(tsk);
    if (!ati)
      return -ENOMEM;
    ati->loginuid = audit_get_loginuid(current);
    ati->sessionid = audit_get_sessionid(current);

    rc = audit_alloc_syscall(tsk);
    if (rc) {
      tsk->audit = NULL;
      kmem_cache_free(audit_task_cache, ati);
    }

    return rc;
  }

- I'm keeping this comment separate from the one above, although it is
very much related and could simplify things further.  Since we are
calling audit_alloc_task() from copy_process (copy_process ->
audit_alloc -> audit_alloc_task) and copy_process will fail to create
the process if audit_alloc()/audit_alloc_task() fails, why not fold
audit_alloc_task() into audit_alloc() and just drop all of the other
audit_alloc_task() calls?  Unless the kernel completely fails, all of
the other audit_alloc_task() callers should already have the
current->audit field allocated.  Can you think of any case where this
wouldn't work?  Am I missing something (maybe in the later patches)?
- Related to the above, if we assume that a valid task_struct *always*
has a task_struct->audit field then there are a number of
task_struct->audit checks in this patchset that can probably be
removed.

*** [PATCH ghak90 v11 02/11] audit: add container id
- Perhaps it gets used more in later patches, but audit_get_contid()
is only used by one caller and it seems like it could be easily
replaced by _audit_contobj_get() and a dereference.  Doing so would
shrink/simplify the patch a bit.
- I noticed that the audit_set_contid() function definition comment
block mentions setting the contid on the "current" task, which isn't
always correct, see proc_contid_write() and the fact that it takes an
arbitrary task_struct as a parameter.
- Another minor point, but when injecting the task into an existing
container you can leverage the _audit_contobj_get() return value (and
do some other simplifications), for example:

  /* task injection to existing container */
  if (current != cont->owner) {
    spin_unlock(&_audit_contobj_list_lock);
    rc = -ENOTUNIQ;
    goto error;
  }
  newcont = _audit_contobj_get(cont);

- I might change the "error" jump label in audit_set_contid() to
"out"
since that code is executed in both the normal and error paths.
- You are calling _audit_contobj_put(oldcont) twice at the end of
audit_set_contid() in the normal/non-error case; this doesn't seem
correct.
- The RCU read lock taken around _audit_contobj_get_bytask() in
audit_alloc_task() could probably be moved inside audit_alloc_task() -
although see my comments for patch #1 - as anyone getting a reference
should probably be holding the RCU lock.
- The locking around and inside of _audit_contobj_put() is
interesting.  We should consider moving the spinlock inside the
function similar to the RCU lock and _audit_contobj_get_bytask()
comment above, but I think we also might be able to help limit when
the spinlock is taken as well.  There are also some corner cases with
respect to contobj lifetimes and owner references that need
correction.  I believe something like this should improve things:

  static void audit_contobj_free(struct rcu_head *entry)
  {
    struct audit_contobj *cont = container_of(entry, struct audit_contobj, rcu);
    put_task_struct(cont->owner);
    kfree(cont);
  }

  static void _audit_contobj_put(struct audit_contobj *cont)
  {
    if (!cont)
      return;
    if (!refcount_dec_and_lock(&cont->refcount, &_audit_contobj_list_lock))
      return;
    list_del_rcu(&cont->list);
    spin_unlock(&_audit_contobj_list_lock);
    call_rcu(&cont->rcu, audit_contobj_free);
  }

  int audit_set_contid(struct task_struct *tsk, u64 contid)
  {
    /* ... */

    /* NOTE: rcu lock, tasklist lock, and task lock already held */

    h = audit_hash_contid(contid);
    list_for_each_entry_rcu(cont, &audit_contid_hash[h], list) {
      if (refcount_read(&cont->refcount) == 0)
        continue;
      if (cont->id == contid) {
        /* task injection to existing container */
        if (current != cont->owner) {
          rc = -ENOTUNIQ;
          /* cleanup, error handling, etc. */
        }
        newcont = _audit_contobj_get(cont);
        break;
      }
    }

    if (!newcont) {
      newcont = kmalloc(sizeof(*newcont), GFP_ATOMIC);
      if (!newcont) {
        rc = -ENOMEM;
        /* cleanup, error handling, etc. */
      }
      INIT_LIST_HEAD(&newcont->list);
      newcont->id = contid;
      newcont->owner = get_task_struct(current);
      refcount_set(&newcont->refcount, 1);
      spin_lock(&_audit_contobj_list_lock);
      list_add_rcu(&newcont->list, &audit_contid_hash[h]);
      spin_unlock(&_audit_contobj_list_lock);
    }

    info->count = newcont;

    /* ... */
  }

*** [PATCH ghak90 v11 03/11] audit: log container info of syscalls
- It would be good to explain the significance of the "record=" field
in the AUDIT_CONTAINER_ID record in this patch.  Currently it is just
an increasing counter with no linkage to anything, if it has meaning
then it should be documented here, if not then we should consider
dropping it.
- It looks like you created audit_contobj_get_bytask() here because
audit_contobj is defined in audit.c; if you defined audit_contobj and
declared the necessary supporting functions in kernel/audit.h these
shenanigans wouldn't be necessary, correct?  I suspect this would
improve/eliminate a number of other functions in this patch too.
- Nit picky, but I wonder if audit_contobj_put() would make more sense
named audit_contobj_mput()?
- Why does audit_log_container_id_ctx() take an audit_context
parameter when it always internally references the current task for
the audit container ID?  Why not simply take the audit_context from
the current task_struct?

*** [PATCH ghak90 v11 04/11] audit: add contid support for signalling
the audit daemon
- The signal caller reset in audit_free() is likely not what we want
as a task could signal/kill auditd and exit, clearing it's info before
auditd could request the signal caller info.  We should be able to
remove the audit_sig_adtsk variable as it is unnecessary.
- I think the only time we should "put" the audit_sig_cid contobj is
when auditd requests the AUDIT_SIGNAL_INFO or AUDIT_SIGNAL_INFO2 info
(clear it in both cases as userspace might not be acid aware).
- Locking and refcounting of the contobj in the signal case is tricky,
we've talked about that in past drafts, and it's complicated here by
some locking that doesn't seem correct in all cases (see my comments
regarding patch #3). Working in the suggestions from patch #3 I end up
with something like this:

  static void _audit_contobj_put(struct audit_contobj *cont)
  {
    if (!cont)
      return;
    if (!refcount_dec_and_lock(&cont->refcount, &_audit_contobj_list_lock))
      return;
    list_del_rcu(&cont->list);
    spin_unlock(&_audit_contobj_list_lock);
    if (refcount_read(&cont->sigcount) == 0)
      call_rcu(&cont->rcu, audit_contobj_free);
  }

  static struct audit_contobj *_audit_contobj_get_sig(void)
  {
    struct audit_cont_obj *cont = current->audit->cont;
    refcount_inc(&cont->sigcount);
    return cont;
  }

  static void _audit_contobj_put_sig(struct audit_contobj *cont)
  {
    if (refcount_dec_and_test(&cont->sigcount) == 0 &&
        refcount_read(&cont->refcount) == 0))
      audit_contobj_free(cont->rcu);
  }

*** [PATCH ghak90 v11 05/11] audit: add support for non-syscall
auxiliary records
- Porting needed due to the io_uring changes, but it seems okay otherwise.

*** [PATCH ghak90 v11 06/11] audit: add containerid support for user records
- Nothing to say here.

*** [PATCH ghak90 v11 07/11] audit: add containerid filtering
- Likely porting is needed, but okay otherwise.

*** [PATCH ghak90 v11 08/11] audit: add support for containerid to
network namespaces
- While the atomicity of refcount is overkill here, it might be wise
to use refcount_t in place of the integer counter in
audit_contobj_netns so we get all the nice rollover, etc. protections.
- There are some interesting locking choices going on here, but I
think some of the changes in the prior patches are going to have an
impact here so I'm not going to look too hard at that right now, I'll
revisit it in the next revision.
- See my previous comments about the "record=" field, they apply here as well.

*** [PATCH ghak90 v11 09/11] audit: contid check descendancy and nesting
- I realize it's tricky, but the commit description could use a bit
more work to make the relationships between the current task, the
target task, and the audit container ID owner more clear.  Examples
are never a bad thing.
- The changes in audit_free() look to be misplaced?  No?

*** [PATCH ghak90 v11 10/11] audit: track container nesting
- In the commit description the network namespace example talks about
"... spawning peer tasks 2 and 3 ..." when I think it should read "...
spawning peer tasks in audit container ID 2 and 3 ...".
- Is the change in audit_receive_msg(AUDIT_SIGNAL_INFO2) misplaced?
- AUDIT_MESSAGE_TEXT_MAX in audit_receive_msg() is *really* big, like
multiple pages big on a 4k PAGE_SIZE system, perhaps start with
something like "(AUDIT_MESSAGE_TEXT_MAX < PAGE_SIZE ?
AUDIT_MESSAGE_TEXT_MAX : PAGE_SIZE)"?
- In audit_set_contid() couldn't you use "current" in the call to
_audit_contobj_get_bytask()?
- It looks like you drop the audit_contobj->parent reference in the
signal caller case, but I don't see where you drop it in the normal
case?  I suspect it would be in the same place where you drop
audit_contobj->owner.
- Perhaps I've been looking at these patches too long today, but can
you explain why the change to audit_contid_comparator is necessary?
It's not clear to me at the moment.

*** [PATCH ghak90 v11 11/11] audit: add capcontid to set contid
outside init_user_ns
- The access control logic in audit_set_capcontid() doesn't seem to
match the commit description, I was expecting something more like
this:

  int audit_set_capcontid(...)
  {
    /* ... */

    rc = -EPERM;
    if (tsk == current || !task_is_descendant(current, tsk))
      rc = -EXDEV;
    else if (current_user_ns() == &init_user_ns &&
             capable(CAP_AUDIT_CONTROL))
      rc = 0;
    else if (audit_get_capcontid(current))
      rc = 0;

    if (!rc)
      info->capcontid = !!enable;

    /* ... */

  }

--
paul moore
www.paul-moore.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Audit container ID patchset v11 comments