>> The registration is a pseudo filesystem (proc, since PID tree
already
>> exists) write of a u8[16] UUID representing the container ID to a file
>> representing a process that will become the first process in a new
>> container. This write might place restrictions on mount namespaces
>> required to define a container, or at least careful checking of
>> namespaces in the kernel to verify permissions of the orchestrator so it
>> can't change its own container ID. A bind mount of nsfs may be
>> necessary in the container orchestrator's mntNS.
>> Note: Use a 128-bit scalar rather than a string to make compares faster
>> and simpler.
>>
>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>> registration.
>
> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
No, because then any process with that capability (vsftpd) could change
its own container ID. This is discussed more in other parts of the
thread...
Not if we make the container ID append-only (to support nesting), or
write-once (the other idea thrown around). In that case, you can't move
"out" from a particular container ID, you can only go "deeper". These
semantics don't make sense for generic containers, but since the point
of this facility is *specifically* for audit I imagine that not being
able to move a process from a sub-container's ID is a benefit.
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/