Re: [PATCH v2 14/39] commoncap: handle idmapped mounts

Monday, 23 November 2020

On Sun, Nov 22, 2020 at 04:18:55PM -0500, Paul Moore wrote:
...
 On Sun, Nov 15, 2020 at 5:39 AM Christian Brauner
 <christian.brauner(a)ubuntu.com&gt; wrote:
 > When interacting with user namespace and non-user namespace aware
 > filesystem capabilities the vfs will perform various security checks to
 > determine whether or not the filesystem capabilities can be used by the
 > caller (e.g. during exec), or even whether they need to be removed. The
 > main infrastructure for this resides in the capability codepaths but they
 > are called through the LSM security infrastructure even though they are not
 > technically an LSM or optional. This extends the existing security hooks
 > security_inode_removexattr(), security_inode_killpriv(),
 > security_inode_getsecurity() to pass down the mount's user namespace and
 > makes them aware of idmapped mounts.
 > In order to actually get filesystem capabilities from disk the capability
 > infrastructure exposes the get_vfs_caps_from_disk() helper. For user
 > namespace aware filesystem capabilities a root uid is stored alongside the
 > capabilities.
 > In order to determine whether the caller can make use of the filesystem
 > capability or whether it needs to be ignored it is translated according to
 > the superblock's user namespace. If it can be translated to uid 0 according
 > to that id mapping the caller can use the filesystem capabilities stored on
 > disk. If we are accessing the inode that holds the filesystem capabilities
 > through an idmapped mount we need to map the root uid according to the
 > mount's user namespace.
 > Afterwards the checks are identical to non-idmapped mounts. Reading
 > filesystem caps from disk enforces that the root uid associated with the
 > filesystem capability must have a mapping in the superblock's user
 > namespace and that the caller is either in the same user namespace or is a
 > descendant of the superblock's user namespace. For filesystems that are
 > mountable inside user namespace the container can just mount the filesystem
 > and won't usually need to idmap it. If it does create an idmapped mount it
 > can mark it with a user namespace it has created and which is therefore a
 > descendant of the s_user_ns. For filesystems that are not mountable inside
 > user namespaces the descendant rule is trivially true because the s_user_ns
 > will be the initial user namespace.
 >
 > If the initial user namespace is passed all operations are a nop so
 > non-idmapped mounts will not see a change in behavior and will also not see
 > any performance impact.
 >
 > Cc: Christoph Hellwig <hch(a)lst.de&gt;
 > Cc: David Howells <dhowells(a)redhat.com&gt;
 > Cc: Al Viro <viro(a)zeniv.linux.org.uk&gt;
 > Cc: linux-fsdevel(a)vger.kernel.org
 > Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com&gt;

 ...

 > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
 > index 8dba8f0983b5..ddb9213a3e81 100644
 > --- a/kernel/auditsc.c
 > +++ b/kernel/auditsc.c
 > @@ -1944,7 +1944,7 @@ static inline int audit_copy_fcaps(struct audit_names *name,
 >         if (!dentry)
 >                 return 0;
 >
 > -       rc = get_vfs_caps_from_disk(dentry, &caps);
 > +       rc = get_vfs_caps_from_disk(&init_user_ns, dentry, &caps);
 >         if (rc)
 >                 return rc;
 >
 > @@ -2495,7 +2495,8 @@ int __audit_log_bprm_fcaps(struct linux_binprm *bprm,
 >         ax->d.next = context->aux;
 >         context->aux = (void *)ax;
 >
 > -       get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps);
 > +       get_vfs_caps_from_disk(mnt_user_ns(bprm->file->f_path.mnt),
 > +                              bprm->file->f_path.dentry, &vcaps);

 As audit currently records information in the context of the
 initial/host namespace I'm guessing we don't want the mnt_user_ns()
 call above; it seems like &init_user_ns would be the right choice
 (similar to audit_copy_fcaps()), yes? 
Ok, sounds good. It also makes the patchset simpler.
Note that I'm currently not on the audit mailing list so this is likely
not going to show up there.

(Fwiw, I responded to you in your other mail too.)

Christian

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [PATCH v2 14/39] commoncap: handle idmapped mounts