Re: [PATCH 2/9] Implement containers as kernel objects

Thursday, 14 September 2017

On 2017-09-06 09:03, Serge E. Hallyn wrote:
...
 Quoting Richard Guy Briggs (rgb(a)redhat.com):
 ...
 > > I believe we are going to need a container ID to container definition
 > > (namespace, etc.) mapping mechanism regardless of if the container ID
 > > is provided by userspace or a kernel generated serial number.  This
 > > mapping should be recorded in the audit log when the container ID is
 > > created/defined.
 > 
 > Agreed.
 > 
 > > > As was suggested in one of the previous threads, if there are any events
not
 > > > associated with a task (incoming network packets) we log the namespace ID
and
 > > > then only concern ourselves with its container serial number or container
name
 > > > once it becomes associated with a task at which point that tracking will
be
 > > > more important anyways.
 > > 
 > > Agreed.  After all, a single namespace can be shared between multiple
 > > containers.  For those security officers who need to track individual
 > > events like this they will have the container ID mapping information
 > > in the logs as well so they should be able to trace the unassociated
 > > event to a set of containers.
 > > 
 > > > I'm not convinced that a userspace or kernel generated UUID is that
useful
 > > > since they are large, not human readable and may not be globally unique
given
 > > > the "pets vs cattle" direction we are going with potentially
identical
 > > > conditions in hosts or containers spawning containers, but I see no need
to
 > > > restrict them.
 > > 
 > > From a kernel perspective I think an int should suffice; after all,
 > > you can't have more containers then you have processes.  If the
 > > container engine requires something more complex, it can use the int
 > > as input to its own mapping function.
 > 
 > PIDs roll over.  That already causes some ambiguity in reporting.  If a
 > system is constantly spawning and reaping containers, especially
 > single-process containers, I don't want to have to worry about that ID
 > rolling to keep track of it even though there should be audit records of
 > the spawn and death of each container.  There isn't significant cost
 > added here compared with some of the other overhead we're dealing with.

 Strawman proposal:

 1. Each clone/unshare/setns involving a namespace type generates an audit
 message along the lines of:

 PID 9512 (pid in init_pid_ns) in auditnsid 00000001 cloned CLONE_NEWNS|CLONE_NEWNET
 new auditnsid: 00000002
 associated namespaces: (list of all namespace filesystem inode numbers) 
As you will have seen, this is pretty much what my most recent proposal suggests.

...
 2. Userspace (i.e. the container logging deamon here) can watch the
audit log
 for all messages relating to auditnsid 00000002.  Presumably there will be
 messages along the lines of "PID 9513 in auditnsid 00000002 cloned...".  The
 container logging daemon can track those messages and add the new auditnsids
 to the list it watches. 
Yes.

...
 3. If a container is migrated (checkpointed and restored here or
elsewhere),
 userspace can just follow the appropriate logs for the new containers. 
Yes.

...
 Userspace does not ever *request* a auditnsid.  They are ephemeral,
just a
 tool to track the namespaces through the audit log.  They are however guaranteed
 to never be re-used until reboot. 
Well, this is where things get controvertial...  I had wanted this, a
kernel-generated serial number unique to a running kernel to track every
container initiation, but this does have some CRIU challenges pointed
out by Eric Biederman.  Nested containers will not have a consistent
view on a new host and no way to make it consistent.  If we could
guarantee that containers would never be nested, this could be workable.
I think nesting is inevitable in the future given the variety and
creativity of the orchestration tools, so restricting this seems
short-sighted.

At the moment the approch is to let the orchestrator determine the ID of
a container.  Some have argued for as small as u32 and others for a full
UUID.  A u32 runs the risk of rolling, so a u64 seems like a reasonable
step to solve that issue.  Others would like to be able to store a full
UUID which seemed like a good idea on the outset, but on further
thinking, this is something the orchestrator can manage while minimising
the number of bits of required information per audit record to guarantee
we can identify the provenance of a particular audit event.  Let's see
if we can make it work with a u64.

...
 (Feels like someone must have proposed this before) 
Thsee ideas have been thrown around a few times and I'm starting to
understand them better.

...
 -serge 
- RGB

--
Richard Guy Briggs <rgb(a)redhat.com&gt;
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [PATCH 2/9] Implement containers as kernel objects