Re: [PATCH 00/34] fs: idmapped mounts

Thursday, 29 October 2020

...
 On Oct 28, 2020, at 5:35 PM, Christian Brauner
<christian.brauner(a)ubuntu.com&gt; wrote:

 Hey everyone,

 I vanished for a little while to focus on this work here so sorry for
 not being available by mail for a while.

 Since quite a long time we have issues with sharing mounts between
 multiple unprivileged containers with different id mappings, sharing a
 rootfs between multiple containers with different id mappings, and also
 sharing regular directories and filesystems between users with different
 uids and gids. The latter use-cases have become even more important with
 the availability and adoption of systemd-homed (cf. [1]) to implement
 portable home directories.

 The solutions we have tried and proposed so far include the introduction
 of fsid mappings, a tiny overlay based filesystem, and an approach to
 call override creds in the vfs. None of these solutions have covered all
 of the above use-cases.

 The solution proposed here has it's origins in multiple discussions
 during Linux Plumbers 2017 during and after the end of the containers
 microconference.
 To the best of my knowledge this involved Aleksa, Stéphane, Eric, David,
 James, and myself. A variant of the solution proposed here has also been
 discussed, again to the best of my knowledge, after a Linux conference
 in St. Petersburg in Russia between Christoph, Tycho, and myself in 2017
 after Linux Plumbers.
 I've taken the time to finally implement a working version of this
 solution over the last weeks to the best of my abilities. Tycho has
 signed up for this sligthly crazy endeavour as well and he has helped
 with the conversion of the xattr codepaths.

 The core idea is to make idmappings a property of struct vfsmount
 instead of tying it to a process being inside of a user namespace which
 has been the case for all other proposed approaches.
 It means that idmappings become a property of bind-mounts, i.e. each
 bind-mount can have a separate idmapping. This has the obvious advantage
 that idmapped mounts can be created inside of the initial user
 namespace, i.e. on the host itself instead of requiring the caller to be
 located inside of a user namespace. This enables such use-cases as e.g.
 making a usb stick available in multiple locations with different
 idmappings (see the vfat port that is part of this patch series).

 The vfsmount struct gains a new struct user_namespace member. The
 idmapping of the user namespace becomes the idmapping of the mount. A
 caller that is either privileged with respect to the user namespace of
 the superblock of the underlying filesystem or a caller that is
 privileged with respect to the user namespace a mount has been idmapped
 with can create a new bind-mount and mark it with a user namespace. 
So one way of thinking about this is that a user namespace that has an idmapped mount can,
effectively, create or chown files with *any* on-disk uid or gid by doing it directly (if
that uid exists in-namespace, which is likely for interesting ids like 0) or by creating a
new userns with that id inside.

For a file system that is private to a container, this seems moderately safe, although
this may depend on what exactly “private” means. We probably want a mechanism such that,
if you are outside the namespace, a reference to a file with the namespace’s vfsmnt does
not confer suid privilege.

Imagine the following attack: user creates a namespace with a root user and arranges to
get an idmapped fs, e.g. by inserting an ext4 usb stick or using whatever container
management tool does this.  Inside the namespace, the user creates a suid-root file.

Now, outside the namespace, the user has privilege over the namespace.  (I’m assuming
there is some tool that will idmap things in a namespace owned by an unprivileged user,
which seems likely.). So the user makes a new bind mount and if maps it to the init
namespace. Game over.

So I think we need to have some control to mitigate this in a comprehensible way. A big
hammer would be to require nosuid. A smaller hammer might be to say that you can’t create
a new idmapped mount unless you have privilege over the userns that you want to use for
the idmap and to say that a vfsmnt’s paths don’t do suid outside the idmap namespace.  We
already do the latter for the vfsmnt’s mntns’s userns.

Hmm.  What happens if we require that an idmap userns equal the vfsmnt’s mntns’s userns? 
Is that too limiting?

I hope that whatever solution gets used is straightforward enough to wrap one’s head
around.

...
 When a file/inode is accessed through an idmapped mount the i_uid
and
 i_gid of the inode will be remapped according to the user namespace the
 mount has been marked with. When a new object is created based on the
 fsuid and fsgid of the caller they will similarly be remapped according
 to the user namespace of the mount they care created from. 
By “mapped according to”, I presume you mean that the on-disk uid/gid is the gid as seen
in the user namespace in question.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [PATCH 00/34] fs: idmapped mounts