On 14/08/28, Eric W. Biederman wrote:
Richard Guy Briggs <rgb(a)redhat.com> writes:
> On 14/08/23, Eric W. Biederman wrote:
>> Richard Guy Briggs <rgb(a)redhat.com> writes:
>>
>> > Generate and assign a serial number per namespace instance since boot.
>> >
>> > Use a serial number per namespace (unique across one boot of one kernel)
>> > instead of the inode number (which is claimed to have had the right to
change
>> > reserved and is not necessarily unique if there is more than one proc fs)
to
>> > uniquely identify it per kernel boot.
>>
>> This approach is just broken.
>>
>> For this to work with migration (aka criu) you need to implement a
>> namespace of namespaces. You haven't done this, and therefore
>> such an interface will break existing userspace.
>>
>> Inside of audit I can understand not caring about these issues,
>> but you go foward and expose these serial numbers in proc,
>> and generally make this infrastructure available to others.
>>
>> The deep issue with migration is that we move tasks from one machine
>> from another and on the destination machine we need to have all of the
>> same global identifiers for software to function properly.
>>
>> My weasel words around the proc inode numbers is to preserve to allow us
>> room to be able to restore those ids if it every becomes relevant for
>> migration.
>
> What do you do if the inode number is already in use on the target
> host?
Since the inode numbers are relative to a superblock or a pid namespace
the numbers that are in use can be restored on the target system
by creating them in the appropriate namespace.
So you seem to be advocating for a namespace of namespaces, since
neither host can create a new namespace without consulting the others in
its pool for a new free number.
The support does not exist in the kernel today for doing that because
no
one has cared but as architected the support can be added if needed to
support migration.
>> That is the proc inode numbers (technically) live in a pid namespace,
>> (aka a mount of proc). So depending on the pid namespace you are in
>> or the mount of proc you look in the numbers could change.
>>
>> Qualifications like that must exist to have a prayer of ever supporting
>> process migration in the crazy corner cases where people start caring
>> about inode numbers.
>>
>> We currently don't and inode numbers for a namespace will never change
>> after a namespace is created. So I think you really are ok using the
>> proc inode numbers. I am happy declaring by fiat that the inode numbers
>> that audit uses are the numbers connected to the initial pid namespace.
>
> But once a namespace/container is migrated, it is a different audit that
> is looking at it (unless we create an audit manager or entity that
> functions at the level of a container manager), so audit should not care.
These numbers were exported to everyone as a general purpose facility in
proc. If audit is global and audit doesn't migrate you are right it
doesn't matter. However if these numbers are used by anyone else for
anything else it causes a problem.
So let us restrict their use to audit, by removing them from
/proc/<pid>/ns/ and only exposing them via netlink calls to audit gated
by CAP_AUDIT_WRITE or CAP_AUDIT_CONTROL.
Further given that people run entire distributions in containers we
may
reach the point where we wish to run auditd in a container in the
future. I would hate to paint ourselves into a corner with a design
that could never allow audit to migrate. Support that case someday
seems a valid naive desire.
Agreed. That is an option we do not want to rule out at this point.
I'll need to think about this one more.
>> At a fairly basic level anything that is used to identify
namespaces for
>> any general purpose use needs to have most if not all of the same
>> properties of the proc inode numbers. The most important of which is
>> being tied to some context/namespace so there is a ability if we ever
>> need it to migrate those numbers from one machine to another.
>
> Sooo... does it make any sense to have those inode or serial numbers be
> blank inside the namespace/container itself, but only visible to its
> manager outside the container (unless it is the initial namespace)?
Mostly I think it makes sense to use the inode numbers from the initial
pid namespace. They already exist. They already are unique. (Which
means I don't need to maintain more code and more special cases). And
the do what you need now.
Will inode numbers never be re-used once they are freed? Guaranteed?
I probably haven't followed closely enough but I don't see
what makes
inode numbers undesirable.
This posting:
https://www.redhat.com/archives/linux-audit/2013-March/msg00032.html
Eric
- RGB
--
Richard Guy Briggs <rbriggs(a)redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545