>>> The registration is a pseudo filesystem (proc, since PID
tree already
>>> exists) write of a u8[16] UUID representing the container ID to a file
>>> representing a process that will become the first process in a new
>>> container. This write might place restrictions on mount namespaces
>>> required to define a container, or at least careful checking of
>>> namespaces in the kernel to verify permissions of the orchestrator
>>> so it
>>> can't change its own container ID. A bind mount of nsfs may be
>>> necessary in the container orchestrator's mntNS.
>>> Note: Use a 128-bit scalar rather than a string to make compares faster
>>> and simpler.
>>>
>>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
>>> registration.
>>
>> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
>
> No, because then any process with that capability (vsftpd) could change
> its own container ID. This is discussed more in other parts of the
> thread...
Not if we make the container ID append-only (to support nesting), or
write-once (the other idea thrown around). In that case, you can't move
"out" from a particular container ID, you can only go "deeper". These
semantics don't make sense for generic containers, but since the point
of this facility is *specifically* for audit I imagine that not being
able to move a process from a sub-container's ID is a benefit.
[This assumes it's CAP_AUDIT_CONTROL which is what we are discussing in
a sister thread.]
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/