On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote:
[REMINDER: It is an "*audit* container ID" and not a general
"container ID" ;) Smiley aside, I'm not kidding about that part.]
This sort of seems like a distinction without a difference; presumably
audit is going to want to differentiate between everything that people
in userspace call a container. So you'll have to support all this
insanity anyway, even if it's "not a container ID".
I'm not interested in supporting/merging something that isn't
useful;
if this doesn't work for your use case then we need to figure out what
would work. It sounds like nested containers are much more common in
the lxc world, can you elaborate a bit more on this?
As far as the possible solutions you mention above, I'm not sure I
like the per-userns audit container IDs, I'd much rather just emit the
necessary tracking information via the audit record stream and let the
log analysis tools figure it out. However, the bigger question is how
to limit (re)setting the audit container ID when you are in a non-init
userns. For reasons already mentioned, using capable() is a non
starter for everything but the initial userns, and using ns_capable()
is equally poor as it essentially allows any userns the ability to
munge it's audit container ID (obviously not good). It appears we
need a different method for controlling access to the audit container
ID.
One option would be to make it a string, and have it be append only.
That should be safe with no checks.
I know there was a long thread about what type to make this thing. I
think you could accomplish the append-only-ness with a u64 if you had
some rule about only allowing setting lower order bits than those that
are already set. With 4 bits for simplicity:
1100 # initial container id
1100 -> 1011 # not allowed
1100 -> 1101 # allowed, but now 1101 is set in stone since there are
# no lower order bits left
There are probably fancier ways to do it if you actually understand
math :)
Since userns nesting is limited to 32 levels (right now, IIRC), and
you have 64 bits, this might be reasonable. You could just teach
container engines to use the first say N bits for themselves, with a 1
bit for the barrier at the end.
Tycho