Identifying thread/process termination
by L. A. Walsh
On 2020/11/16 05:43, Paul Moore wrote:
> The most important thing to keep in mind is that all of the threads
> inside a process share the same memory space. It is the lack of a
> strong, enforceable boundary between threads which makes it difficult,
> if not impossible, to view threads as individual entities from a
> security perspective.
---
Depends on how much your security policy relies on recognizing
abnormal behavior. If a program splits function across well defined
areas by a named thread, one may develop a baseline of "normal"
functionality associated with given threads. Determining that
a thread is operating outside it's normal range can allow for a
earlier detection and better monitoring of "safe" and/or secure
operation.
How programs operate, especially in regards to what work is
normal for a given thread can only be done with thread level
monitoring. While given threads _can_ access global-user memory,
that involves how they are coded or programmed to run. That, in
turn, can be used to help define boundaries and integrity levels
of various processes in a system.
For example, even though a logging thread might gather data
from other threads, knowing that it can only write to output
to specific configured destinations would allow swift detection
of aberrations.
3 years, 11 months
-a never,exit still being logged
by Andreas Hasenack
Hi,
continuing my experiments in trying to reduce the auditd noise, I have
these two rules:
# auditctl -l
-a never,exit -F arch=b64 -S setsockopt -F a2=0x40 -F
exe=/sbin/iptables -F auid=-1
-a never,exit -F arch=b64 -S setsockopt -F a2=0x40 -F
exe=/sbin/xtables-multi -F auid=-1
I did use -F auid=4294967295 in the rules file, and auditd seems to
have understood that correctly as it's showing -1 in the rules list.
But this event is still being logged:
type=NETFILTER_CFG msg=audit(1605810940.198:1089): table=filter
family=2 entries=281
type=SYSCALL msg=audit(1605810940.198:1089): arch=c00000b7 syscall=208
success=yes exit=0 a0=4 a1=0 a2=40 a3=aaaaf478e680 items=0 ppid=7950
pid=31235 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0
sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iptables-restor"
exe="/sbin/xtables-multi" key=(null)
type=PROCTITLE msg=audit(1605810940.198:1089):
proctitle=69707461626C65732D726573746F7265002D2D6E6F666C757368002D2D766572626F7365002D2D77616974003130002D2D776169742D696E74657276616C003530303030
Same event, decoded with ausearch -i:
----
type=PROCTITLE msg=audit(11/19/20 18:35:40.198:1089) :
proctitle=iptables-restore --noflush --verbose --wait 10
--wait-interval 50000
type=SYSCALL msg=audit(11/19/20 18:35:40.198:1089) : arch=aarch64
syscall=setsockopt success=yes exit=0 a0=0x4 a1=ip
a2=IPT_SO_SET_REPLACE a3=0xaaaaf478e680 items=0 ppid=7950 pid=31235
auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root
sgid=root fsgid=root tty=(none) ses=unset comm=iptables-restor
exe=/sbin/xtables-multi key=(null)
type=NETFILTER_CFG msg=audit(11/19/20 18:35:40.198:1089) :
table=filter family=ipv4 entries=281
----
Why is it being logged, given that it matches the second (and last) rule I have?
3 years, 11 months
Default logging with no rules
by Andreas Hasenack
Hi there,
I started playing with the audit subsystem a few days ago, and noticed
that even without any rules, there is a lot of logging going on. I
understand that rules have to be fine tuned, and I was expecting
having to do that, but I wasn't expecting the amount of logs on a busy
system with no rules at all.
I read in an old presentation (~2011) that these come from "trusted
apps", and in fact any process with cap_audit_write (iirc) can log
such events. The tip was that exclude/never list/action could be used
to reduce this noise, is that still the case and recommended approach?
Or is there a way to use audit with only the rules defined in
/etc/audit/rules.d?
3 years, 11 months
Identifying thread/process termination
by Natan Yellin
Hello,
I've been tracking all process terminations using a rule for the exit and
exit_group syscalls. However, by looking at the audit events for exit it is
impossible to differentiate between the death of different threads in the
same thread group. Is there an alternative way to track this?
For my use case, I would like to know when either processes or individual
threads execute and terminate. (I'm fine tracking at either granularity.)
Right now I can track the creation properly using fork/clone/etc but for
termination I receive multiple exit events with identical information that
doesn't let me know which thread died.
Thanks,
Natan
3 years, 11 months
[PATCH v22 00/23] LSM: Module stacking for AppArmor
by Casey Schaufler
This patchset provides the changes required for
the AppArmor security module to stack safely with any other.
v22: Rebase to 5.10-rc1
v21: Rebase to 5.9-rc4
Incorporate feedback from v20
- Further revert UDS SO_PEERSEC to use scaffolding around
the interfaces that use lsmblobs and store only a single
secid. The possibility of multiple security modules
requiring data here is still a future problem.
- Incorporate Richard Guy Briggs' non-syscall auxiliary
records patch (patch 0019-0021) in place of my "supplimental"
records implementation. [I'm not sure I've given proper
attestation. I will correct as appropriate]
v20: Rebase to 5.9-rc1
Change the BPF security module to use the lsmblob data. (patch 0002)
Repair length logic in subject label processing (patch 0015)
Handle -EINVAL from the empty BPF setprocattr hook (patch 0020)
Correct length processing in append_ctx() (patch 0022)
v19: Rebase to 5.8-rc6
Incorporate feedback from v18
- Revert UDS SO_PEERSEC implementation to use lsmblobs
directly, rather than allocating as needed. The correct
treatment of out-of-memory conditions in the later case
is difficult to define. (patch 0005)
- Use a size_t in append_ctx() (patch 0021)
- Fix a memory leak when creating compound contexts. (patch 0021)
Fix build error when CONFIG_SECURITY isn't set (patch 0013)
Fix build error when CONFIG_SECURITY isn't set (patch 0020)
Fix build error when CONFIG_SECURITY isn't set (patch 0021)
v18: Rebase to 5.8-rc3
Incorporate feedback from v17
- Null pointer checking in UDS (patch 0005)
Match changes in IMA code (patch 0012)
Fix the behavior of LSM context supplimental audit
records so that there's always exactly one when it's
appropriate for there to be one. This is a substantial
change that requires extention of the audit_context beyond
syscall events. (patch 0020)
v17: Rebase to 5.7-rc4
v16: Rebase to 5.6
Incorporate feedback from v15 - Thanks Stephen, Mimi and Paul
- Generally improve commit messages WRT scaffolding
- Comment ima_lsm_isset() (patch 0002)
- Some question may remain on IMA warning (patch 0002)
- Mark lsm_slot as __lsm_ro_after_init not __init_data (patch 0002)
- Change name of lsmblob variable in ima_match_rules() (patch 0003)
- Instead of putting a struct lsmblob into the unix_skb_parms
structure put a pointer to an allocated instance. There is
currently only space for 5 u32's in unix_skb_parms and it is
likely to get even tighter. Fortunately, the lifecycle
management of the allocated lsmblob is simple. (patch 0005)
- Dropped Acks due to the above change (patch 0005)
- Improved commentary on secmark labeling scaffolding. (patch 0006)
- Reduced secmark related labeling scaffolding. (patch 0006)
- Replace use of the zeroth entry of an lsmblob in scaffolding
with a function lsmblob_value() to hopefully make it less
obscure. (patch 0006)
- Convert security_secmark_relabel_packet to use lsmblob as
this reduces much of the most contentious scaffolding. (patch 0006)
- Dropped Acks due to the above change (patch 0006)
- Added BUILD_BUG_ON() for CIPSO tag 6. (patch 0018)
- Reworked audit subject information. Instead of adding fields in
the middle of existing records add a new record to the event. When
a separate record is required use subj="?". (patch 0020)
- Dropped Acks due to the above change (patch 0020)
- Reworked audit object information. Instead of adding fields in
the middle of existing records add a new record to the event. When
a separate record is required use obj="?". (patch 0021)
- Dropped Acks due to the above change (patch 0021)
- Enhanced documentation (patch 0022)
- Removed unnecessary error code check in security_getprocattr()
(patch 0021)
v15: Rebase to 5.6-rc1
- Revise IMA data use (patch 0002)
Incorporate feedback from v14
- Fix lockdown module registration naming (patch 0002)
- Revise how /proc/self/attr/context is gathered. (patch 0022)
- Revise access modes on /proc/self/attr/context. (patch 0022)
- Revise documentation on LSM external interfaces. (patch 0022)
v14: Rebase to 5.5-rc5
Incorporate feedback from v13
- Use an array of audit rules (patch 0002)
- Significant change, removed Acks (patch 0002)
- Remove unneeded include (patch 0013)
- Use context.len correctly (patch 0015)
- Reorder code to be more sensible (patch 0016)
- Drop SO_PEERCONTEXT as it's not needed yet (patch 0023)
v13: Rebase to 5.5-rc2
Incorporate feedback from v12
- Print lsmblob size with %z (Patch 0002)
- Convert lockdown LSM initialization. (Patch 0002)
- Restore error check in nft_secmark_compute_secid (Patch 0006)
- Correct blob scaffolding in ima_must_appraise() (Patch 0009)
- Make security_setprocattr() clearer (Patch 0013)
- Use lsm_task_display more widely (Patch 0013)
- Use passed size in lsmcontext_init() (Patch 0014)
- Don't add a smack_release_secctx() hook (Patch 0014)
- Don't print warning in security_release_secctx() (Patch 0014)
- Don't duplicate the label in nfs4_label_init_security() (Patch 0016)
- Remove reviewed-by as code has significant change (Patch 0016)
- Send the entire lsmblob for Tag 6 (Patch 0019)
- Fix description of socket_getpeersec_stream parameters (Patch 0023)
- Retain LSMBLOB_FIRST. What was I thinking? (Patch 0023)
- Add compound context to LSM documentation (Patch 0023)
v12: Rebase to 5.5-rc1
Fixed a couple of incorrect contractions in the text.
v11: Rebase to 5.4-rc6
Incorporate feedback from v10
- Disambiguate reading /proc/.../attr/display by restricting
all use of the interface to the current process.
- Fix a merge error in AppArmor's display attribute check
v10: Ask the security modules if the display can be changed.
v9: There is no version 9
v8: Incorporate feedback from v7
- Minor clean-up in display value management
- refactor "compound" context creation to use a common
append_ctx() function.
v7: Incorporate feedback from v6
- Make setting the display a privileged operation. The
availability of compound contexts reduces the need for
setting the display.
v6: Incorporate feedback from v5
- Add subj_<lsm>= and obj_<lsm>= fields to audit records
- Add /proc/.../attr/context to get the full context in
lsmname\0value\0... format as suggested by Simon McVittie
- Add SO_PEERCONTEXT for getsockopt() to get the full context
in the same format, also suggested by Simon McVittie.
- Add /sys/kernel/security/lsm_display_default to provide
the display default value.
v5: Incorporate feedback from v4
- Initialize the lsmcontext in security_secid_to_secctx()
- Clear the lsmcontext in all security_release_secctx() cases
- Don't use the "display" on strictly internal context
interfaces.
- The SELinux binder hooks check for cases where the context
"display" isn't compatible with SELinux.
v4: Incorporate feedback from v3
- Mark new lsm_<blob>_alloc functions static
- Replace the lsm and slot fields of the security_hook_list
with a pointer to a LSM allocated lsm_id structure. The
LSM identifies if it needs a slot explicitly. Use the
lsm_id rather than make security_add_hooks return the
slot value.
- Validate slot values used in security.c
- Reworked the "display" process attribute handling so that
it works right and doesn't use goofy list processing.
- fix display value check in dentry_init_security
- Replace audit_log of secids with '?' instead of deleting
the audit log
v3: Incorporate feedback from v2
- Make lsmblob parameter and variable names more
meaningful, changing "le" and "l" to "blob".
- Improve consistency of constant naming.
- Do more sanity checking during LSM initialization.
- Be a bit clearer about what is temporary scaffolding.
- Rather than clutter security_getpeersec_dgram with
otherwise unnecessary checks remove the apparmor
stub, which does nothing useful.
Patch 0001 moves management of the sock security blob
from the individual modules to the infrastructure.
Patches 0002-0011 replace system use of a "secid" with
a structure "lsmblob" containing information from the
security modules to be held and reused later. At this
point lsmblob contains an array of u32 secids, one "slot"
for each of the security modules compiled into the
kernel that used secids. A "slot" is allocated when
a security module requests one.
The infrastructure is changed to use the slot number
to pass the correct secid to or from the security module
hooks.
It is important that the lsmblob be a fixed size entity
that does not have to be allocated. Several of the places
where it is used would have performance and/or locking
issues with dynamic allocation.
Patch 0012 provides a mechanism for a process to
identify which security module's hooks should be used
when displaying or converting a security context string.
A new interface /proc/self/attr/display contains the name
of the security module to show. Reading from this file
will present the name of the module, while writing to
it will set the value. Only names of active security
modules are accepted. Internally, the name is translated
to the appropriate "slot" number for the module which
is then stored in the task security blob. Setting the
display requires that all modules using the /proc interfaces
allow the transition. The "display" of other processess
can be neither read nor written. All suggested cases
for reading the display of a different process have race
conditions.
Patch 0013 Starts the process of changing how a security
context is represented. Since it is possible for a
security context to have been generated by more than one
security module it is now necessary to note which module
created a security context so that the correct "release"
hook can be called. There are several places where the
module that created a security context cannot be inferred.
This is achieved by introducing a "lsmcontext" structure
which contains the context string, its length and the
"slot" number of the security module that created it.
The security_release_secctx() interface is changed,
replacing the (string,len) pointer pair with a lsmcontext
pointer.
Patches 0014-0016 convert the security interfaces from
(string,len) pointer pairs to a lsmcontext pointer.
The slot number identifying the creating module is
added by the infrastructure. Where the security context
is stored for extended periods the data type is changed.
The Netlabel code is converted to save lsmblob structures
instead of secids in Patch 0017. This is not strictly
necessary as there can only be one security module that
uses Netlabel at this point. Using a lsmblob is much
cleaner, as the interfaces that use the data have all
been converted.
Patch 0018 adds checks to the binder hooks which verify
that both ends of a transaction use the same "display".
Patches 0019-0021 add addition audit records for subject
and object LSM data when there are multiple security modules
with such data. The AUDIT_MAC_TASK_CONTEXTS record is
used in conjuction with a "subj=?" field to identify the
subject data. The AUDIT_MAC_OBJ_CONTEXTS record is used in
conjuction with a "obj=?" field to identify the object data.
The AUDIT_MAC_TASK_CONTEXTS record identifies the security
module with the data: "subj_selinux=xyz_t subj_apparmor=abc".
The AUDIT_MAC_OBJ_CONTEXTS record identifies the security
module with the data: "obj_selinux=xyz_t obj_apparmor=abc".
While AUDIT_MAC_TASK_CONTEXTS records will always contain
an entry for each possible security modules, AUDIT_MAC_OBJ_CONTEXTS
records will only contain entries for security modules for
which the object in question has data.
An example of the MAC_TASK_CONTEXTS (1420) record is:
type=UNKNOWN[1420]
msg=audit(1600880931.832:113)
subj_apparmor==unconfined
subj_smack=_
An example of the MAC_OBJ_CONTEXTS (1421) record is:
type=UNKNOWN[1421]
msg=audit(1601152467.009:1050):
obj_selinux=unconfined_u:object_r:user_home_t:s0
Patch 0022 adds a new interface for getting the
compound security contexts, /proc/self/attr/context.
An example of the content of this file is:
selinux\0one_u:one_r:one_t:s0-s0:c0.c1023\0apparmor\0unconfined\0
Finally, with all interference on the AppArmor hooks
removed, Patch 0023 removes the exclusive bit from
AppArmor. An unnecessary stub hook was also removed.
The Ubuntu project is using an earlier version of
this patchset in their distribution to enable stacking
for containers.
Performance measurements to date have the change
within the "noise". The sockperf and dbench results
are on the order of 0.2% to 0.8% difference, with
better performance being as common as worse. The
benchmarks were run with AppArmor and Smack on Ubuntu.
https://github.com/cschaufler/lsm-stacking.git#stack-5.10-rc1-v22
Signed-off-by: Casey Schaufler <casey(a)schaufler-ca.com>
---
3 years, 11 months
[PATCH] audit-testsuite: tests for subject and object correctness
by Casey Schaufler
Verify that there are subj= and obj= fields in a record
if and only if they are expected. A system without a security
module that provides these fields should not include them.
A system with multiple security modules providing these fields
(e.g. SELinux and AppArmor) should always provide "?" for the
data and also include a AUDIT_MAC_TASK_CONTEXTS or
AUDIT_MAC_OBJ_CONTEXTS record. The test uses the LSM list from
/sys/kernel/security/lsm to determine which format is expected.
Signed-off-by: Casey Schaufler <casey(a)schaufler-ca.com>
---
tests/Makefile | 1 +
tests/multiple_lsms/Makefile | 12 +++
tests/multiple_lsms/test | 166 +++++++++++++++++++++++++++++++++++
3 files changed, 179 insertions(+)
create mode 100644 tests/multiple_lsms/Makefile
create mode 100755 tests/multiple_lsms/test
diff --git a/tests/Makefile b/tests/Makefile
index a7f242a..253e906 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -18,6 +18,7 @@ TESTS := \
file_create \
file_delete \
file_rename \
+ multiple_lsms \
filter_exclude \
filter_saddr_fam \
filter_sessionid \
diff --git a/tests/multiple_lsms/Makefile b/tests/multiple_lsms/Makefile
new file mode 100644
index 0000000..c2a8e87
--- /dev/null
+++ b/tests/multiple_lsms/Makefile
@@ -0,0 +1,12 @@
+#
+# Copyright (C) Intel Corporation, 2020
+#
+
+TARGETS=$(patsubst %.c,%,$(wildcard *.c))
+
+LDLIBS += -lpthread
+
+all: $(TARGETS)
+clean:
+ rm -f $(TARGETS)
+
diff --git a/tests/multiple_lsms/test b/tests/multiple_lsms/test
new file mode 100755
index 0000000..c9afed5
--- /dev/null
+++ b/tests/multiple_lsms/test
@@ -0,0 +1,166 @@
+#!/usr/bin/perl
+#
+# Copyright (C) Intel Corporation, 2020
+#
+
+use strict;
+
+use Test;
+BEGIN { plan tests => 3 }
+
+use File::Temp qw/ tempdir tempfile /;
+
+###
+# functions
+
+sub key_gen {
+ my @chars = ( "A" .. "Z", "a" .. "z" );
+ my $key = "testsuite-" . time . "-";
+ $key .= $chars[ rand @chars ] for 1 .. 8;
+ return $key;
+}
+
+###
+# setup
+
+# reset audit
+system("auditctl -D >& /dev/null");
+
+my $line;
+my $lsm_out;
+my $lsm_count = 0;
+my $bpf_enabled = 0;
+
+open($lsm_out, "cat /sys/kernel/security/lsm |");
+while ( $line = <$lsm_out> ) {
+ if ( $line =~ /selinux/ ) {
+ $lsm_count = $lsm_count + 1;
+ }
+ if ( $line =~ /smack/ ) {
+ $lsm_count = $lsm_count + 1;
+ }
+ if ( $line =~ /apparmor/ ) {
+ $lsm_count = $lsm_count + 1;
+ }
+ if ( $line =~ /bpf/ ) {
+ $bpf_enabled = 1;
+ }
+}
+close($lsm_out);
+
+if ( $lsm_count and $bpf_enabled ) {
+ $lsm_count = $lsm_count + 1;
+}
+# create temp directory
+my $dir = tempdir( TEMPLATE => '/tmp/audit-testsuite-XXXX', CLEANUP => 1 );
+
+# create stdout/stderr sinks
+( my $fh_out, my $stdout ) = tempfile(
+ TEMPLATE => '/tmp/audit-testsuite-out-XXXX',
+ UNLINK => 1
+);
+( my $fh_err, my $stderr ) = tempfile(
+ TEMPLATE => '/tmp/audit-testsuite-err-XXXX',
+ UNLINK => 1
+);
+
+###
+# tests
+
+# create a test file
+( my $fh, my $filename ) =
+ tempfile( TEMPLATE => $dir . "/file-XXXX", UNLINK => 1 );
+
+# set the directory watch
+my $key = key_gen();
+system("auditctl -w $dir -k $key");
+
+# delete file
+unlink($filename);
+
+# make sure the records had a chance to bubble through to the logs
+system("auditctl -m syncmarker-$key");
+for ( my $i = 0 ; $i < 10 ; $i++ ) {
+ if ( system("ausearch -m USER | grep -q syncmarker-$key") eq 0 ) {
+ last;
+ }
+ sleep(0.2);
+}
+
+# test if we generate any audit records from the watch
+my $result = system("ausearch -i -k $key > $stdout 2> $stderr");
+ok( $result, 0 );
+
+# test if we generate a MAC_TASK_CONTEXTS record if and
+# only if it is required.
+#
+# test if we generate a MAC_OBJ_CONTEXTS record if and
+# only if it is required.
+
+my $found_auxsubj = 0;
+my $found_subjattr = 0;
+my $found_regsubj = 0;
+
+my $found_auxobj = 0;
+my $found_objattr = 0;
+my $found_regobj = 0;
+
+while ( $line = <$fh_out> ) {
+
+ if ( $line =~ / subj=\? / ) {
+ $found_auxsubj = 1;
+ } elsif ( $line =~ / subj=/ ) {
+ $found_regsubj = 1;
+ }
+ if ( $line =~ / subj_selinux=/ ) {
+ $found_subjattr = 1;
+ }
+ if ( $line =~ / subj_apparmor=/ ) {
+ $found_subjattr = 1;
+ }
+ if ( $line =~ / subj_smack=/ ) {
+ $found_subjattr = 1;
+ }
+
+ if ( $line =~ / obj=\? / ) {
+ $found_auxobj = 1;
+ } elsif ( $line =~ / obj=/ ) {
+ $found_regobj = 1;
+ }
+ if ( $line =~ / obj_selinux=/ ) {
+ $found_objattr = 1;
+ }
+ if ( $line =~ / obj_apparmor=/ ) {
+ $found_objattr = 1;
+ }
+ if ( $line =~ / obj_smack=/ ) {
+ $found_objattr = 1;
+ }
+}
+
+# three cases:
+# no subj= field or MAC_TASK_CONTEXTS when no supplying LSM
+# subj=$value field, no MAC_TASK_CONTEXTS for exactly one supplying LSM
+# subj=? field and a MAC_TASK_CONTEXTS for more than one supplying LSM
+#
+if ($lsm_count == 0) {
+ ok($found_regsubj == 0 and $found_auxsubj == 0);
+} elsif ($lsm_count == 1) {
+ ok($found_regsubj and $found_auxsubj == 0);
+} else {
+ ok($found_subjattr and $found_auxsubj);
+}
+
+if ($lsm_count == 0) {
+ ok($found_regobj == 0 and $found_auxobj == 0);
+} elsif ($lsm_count == 1) {
+ ok($found_regobj and $found_auxobj == 0);
+} else {
+ ok($found_objattr and $found_auxobj);
+}
+
+###
+# cleanup
+
+system("auditctl -D >& /dev/null");
+
--
2.24.1
3 years, 11 months
[PATCH v22 00/23] LSM: Module stacking for AppArmor
by Casey Schaufler
This patchset provides the changes required for
the AppArmor security module to stack safely with any other.
v22: Rebase to 5.10-rc1
v21: Rebase to 5.9-rc4
Incorporate feedback from v20
- Further revert UDS SO_PEERSEC to use scaffolding around
the interfaces that use lsmblobs and store only a single
secid. The possibility of multiple security modules
requiring data here is still a future problem.
- Incorporate Richard Guy Briggs' non-syscall auxiliary
records patch (patch 0019-0021) in place of my "supplimental"
records implementation. [I'm not sure I've given proper
attestation. I will correct as appropriate]
v20: Rebase to 5.9-rc1
Change the BPF security module to use the lsmblob data. (patch 0002)
Repair length logic in subject label processing (patch 0015)
Handle -EINVAL from the empty BPF setprocattr hook (patch 0020)
Correct length processing in append_ctx() (patch 0022)
v19: Rebase to 5.8-rc6
Incorporate feedback from v18
- Revert UDS SO_PEERSEC implementation to use lsmblobs
directly, rather than allocating as needed. The correct
treatment of out-of-memory conditions in the later case
is difficult to define. (patch 0005)
- Use a size_t in append_ctx() (patch 0021)
- Fix a memory leak when creating compound contexts. (patch 0021)
Fix build error when CONFIG_SECURITY isn't set (patch 0013)
Fix build error when CONFIG_SECURITY isn't set (patch 0020)
Fix build error when CONFIG_SECURITY isn't set (patch 0021)
v18: Rebase to 5.8-rc3
Incorporate feedback from v17
- Null pointer checking in UDS (patch 0005)
Match changes in IMA code (patch 0012)
Fix the behavior of LSM context supplimental audit
records so that there's always exactly one when it's
appropriate for there to be one. This is a substantial
change that requires extention of the audit_context beyond
syscall events. (patch 0020)
v17: Rebase to 5.7-rc4
v16: Rebase to 5.6
Incorporate feedback from v15 - Thanks Stephen, Mimi and Paul
- Generally improve commit messages WRT scaffolding
- Comment ima_lsm_isset() (patch 0002)
- Some question may remain on IMA warning (patch 0002)
- Mark lsm_slot as __lsm_ro_after_init not __init_data (patch 0002)
- Change name of lsmblob variable in ima_match_rules() (patch 0003)
- Instead of putting a struct lsmblob into the unix_skb_parms
structure put a pointer to an allocated instance. There is
currently only space for 5 u32's in unix_skb_parms and it is
likely to get even tighter. Fortunately, the lifecycle
management of the allocated lsmblob is simple. (patch 0005)
- Dropped Acks due to the above change (patch 0005)
- Improved commentary on secmark labeling scaffolding. (patch 0006)
- Reduced secmark related labeling scaffolding. (patch 0006)
- Replace use of the zeroth entry of an lsmblob in scaffolding
with a function lsmblob_value() to hopefully make it less
obscure. (patch 0006)
- Convert security_secmark_relabel_packet to use lsmblob as
this reduces much of the most contentious scaffolding. (patch 0006)
- Dropped Acks due to the above change (patch 0006)
- Added BUILD_BUG_ON() for CIPSO tag 6. (patch 0018)
- Reworked audit subject information. Instead of adding fields in
the middle of existing records add a new record to the event. When
a separate record is required use subj="?". (patch 0020)
- Dropped Acks due to the above change (patch 0020)
- Reworked audit object information. Instead of adding fields in
the middle of existing records add a new record to the event. When
a separate record is required use obj="?". (patch 0021)
- Dropped Acks due to the above change (patch 0021)
- Enhanced documentation (patch 0022)
- Removed unnecessary error code check in security_getprocattr()
(patch 0021)
v15: Rebase to 5.6-rc1
- Revise IMA data use (patch 0002)
Incorporate feedback from v14
- Fix lockdown module registration naming (patch 0002)
- Revise how /proc/self/attr/context is gathered. (patch 0022)
- Revise access modes on /proc/self/attr/context. (patch 0022)
- Revise documentation on LSM external interfaces. (patch 0022)
v14: Rebase to 5.5-rc5
Incorporate feedback from v13
- Use an array of audit rules (patch 0002)
- Significant change, removed Acks (patch 0002)
- Remove unneeded include (patch 0013)
- Use context.len correctly (patch 0015)
- Reorder code to be more sensible (patch 0016)
- Drop SO_PEERCONTEXT as it's not needed yet (patch 0023)
v13: Rebase to 5.5-rc2
Incorporate feedback from v12
- Print lsmblob size with %z (Patch 0002)
- Convert lockdown LSM initialization. (Patch 0002)
- Restore error check in nft_secmark_compute_secid (Patch 0006)
- Correct blob scaffolding in ima_must_appraise() (Patch 0009)
- Make security_setprocattr() clearer (Patch 0013)
- Use lsm_task_display more widely (Patch 0013)
- Use passed size in lsmcontext_init() (Patch 0014)
- Don't add a smack_release_secctx() hook (Patch 0014)
- Don't print warning in security_release_secctx() (Patch 0014)
- Don't duplicate the label in nfs4_label_init_security() (Patch 0016)
- Remove reviewed-by as code has significant change (Patch 0016)
- Send the entire lsmblob for Tag 6 (Patch 0019)
- Fix description of socket_getpeersec_stream parameters (Patch 0023)
- Retain LSMBLOB_FIRST. What was I thinking? (Patch 0023)
- Add compound context to LSM documentation (Patch 0023)
v12: Rebase to 5.5-rc1
Fixed a couple of incorrect contractions in the text.
v11: Rebase to 5.4-rc6
Incorporate feedback from v10
- Disambiguate reading /proc/.../attr/display by restricting
all use of the interface to the current process.
- Fix a merge error in AppArmor's display attribute check
v10: Ask the security modules if the display can be changed.
v9: There is no version 9
v8: Incorporate feedback from v7
- Minor clean-up in display value management
- refactor "compound" context creation to use a common
append_ctx() function.
v7: Incorporate feedback from v6
- Make setting the display a privileged operation. The
availability of compound contexts reduces the need for
setting the display.
v6: Incorporate feedback from v5
- Add subj_<lsm>= and obj_<lsm>= fields to audit records
- Add /proc/.../attr/context to get the full context in
lsmname\0value\0... format as suggested by Simon McVittie
- Add SO_PEERCONTEXT for getsockopt() to get the full context
in the same format, also suggested by Simon McVittie.
- Add /sys/kernel/security/lsm_display_default to provide
the display default value.
v5: Incorporate feedback from v4
- Initialize the lsmcontext in security_secid_to_secctx()
- Clear the lsmcontext in all security_release_secctx() cases
- Don't use the "display" on strictly internal context
interfaces.
- The SELinux binder hooks check for cases where the context
"display" isn't compatible with SELinux.
v4: Incorporate feedback from v3
- Mark new lsm_<blob>_alloc functions static
- Replace the lsm and slot fields of the security_hook_list
with a pointer to a LSM allocated lsm_id structure. The
LSM identifies if it needs a slot explicitly. Use the
lsm_id rather than make security_add_hooks return the
slot value.
- Validate slot values used in security.c
- Reworked the "display" process attribute handling so that
it works right and doesn't use goofy list processing.
- fix display value check in dentry_init_security
- Replace audit_log of secids with '?' instead of deleting
the audit log
v3: Incorporate feedback from v2
- Make lsmblob parameter and variable names more
meaningful, changing "le" and "l" to "blob".
- Improve consistency of constant naming.
- Do more sanity checking during LSM initialization.
- Be a bit clearer about what is temporary scaffolding.
- Rather than clutter security_getpeersec_dgram with
otherwise unnecessary checks remove the apparmor
stub, which does nothing useful.
Patch 0001 moves management of the sock security blob
from the individual modules to the infrastructure.
Patches 0002-0011 replace system use of a "secid" with
a structure "lsmblob" containing information from the
security modules to be held and reused later. At this
point lsmblob contains an array of u32 secids, one "slot"
for each of the security modules compiled into the
kernel that used secids. A "slot" is allocated when
a security module requests one.
The infrastructure is changed to use the slot number
to pass the correct secid to or from the security module
hooks.
It is important that the lsmblob be a fixed size entity
that does not have to be allocated. Several of the places
where it is used would have performance and/or locking
issues with dynamic allocation.
Patch 0012 provides a mechanism for a process to
identify which security module's hooks should be used
when displaying or converting a security context string.
A new interface /proc/self/attr/display contains the name
of the security module to show. Reading from this file
will present the name of the module, while writing to
it will set the value. Only names of active security
modules are accepted. Internally, the name is translated
to the appropriate "slot" number for the module which
is then stored in the task security blob. Setting the
display requires that all modules using the /proc interfaces
allow the transition. The "display" of other processess
can be neither read nor written. All suggested cases
for reading the display of a different process have race
conditions.
Patch 0013 Starts the process of changing how a security
context is represented. Since it is possible for a
security context to have been generated by more than one
security module it is now necessary to note which module
created a security context so that the correct "release"
hook can be called. There are several places where the
module that created a security context cannot be inferred.
This is achieved by introducing a "lsmcontext" structure
which contains the context string, its length and the
"slot" number of the security module that created it.
The security_release_secctx() interface is changed,
replacing the (string,len) pointer pair with a lsmcontext
pointer.
Patches 0014-0016 convert the security interfaces from
(string,len) pointer pairs to a lsmcontext pointer.
The slot number identifying the creating module is
added by the infrastructure. Where the security context
is stored for extended periods the data type is changed.
The Netlabel code is converted to save lsmblob structures
instead of secids in Patch 0017. This is not strictly
necessary as there can only be one security module that
uses Netlabel at this point. Using a lsmblob is much
cleaner, as the interfaces that use the data have all
been converted.
Patch 0018 adds checks to the binder hooks which verify
that both ends of a transaction use the same "display".
Patches 0019-0021 add addition audit records for subject
and object LSM data when there are multiple security modules
with such data. The AUDIT_MAC_TASK_CONTEXTS record is
used in conjuction with a "subj=?" field to identify the
subject data. The AUDIT_MAC_OBJ_CONTEXTS record is used in
conjuction with a "obj=?" field to identify the object data.
The AUDIT_MAC_TASK_CONTEXTS record identifies the security
module with the data: "subj_selinux=xyz_t subj_apparmor=abc".
The AUDIT_MAC_OBJ_CONTEXTS record identifies the security
module with the data: "obj_selinux=xyz_t obj_apparmor=abc".
While AUDIT_MAC_TASK_CONTEXTS records will always contain
an entry for each possible security modules, AUDIT_MAC_OBJ_CONTEXTS
records will only contain entries for security modules for
which the object in question has data.
An example of the MAC_TASK_CONTEXTS (1420) record is:
type=UNKNOWN[1420]
msg=audit(1600880931.832:113)
subj_apparmor==unconfined
subj_smack=_
An example of the MAC_OBJ_CONTEXTS (1421) record is:
type=UNKNOWN[1421]
msg=audit(1601152467.009:1050):
obj_selinux=unconfined_u:object_r:user_home_t:s0
Patch 0022 adds a new interface for getting the
compound security contexts, /proc/self/attr/context.
An example of the content of this file is:
selinux\0one_u:one_r:one_t:s0-s0:c0.c1023\0apparmor\0unconfined\0
Finally, with all interference on the AppArmor hooks
removed, Patch 0023 removes the exclusive bit from
AppArmor. An unnecessary stub hook was also removed.
The Ubuntu project is using an earlier version of
this patchset in their distribution to enable stacking
for containers.
Performance measurements to date have the change
within the "noise". The sockperf and dbench results
are on the order of 0.2% to 0.8% difference, with
better performance being as common as worse. The
benchmarks were run with AppArmor and Smack on Ubuntu.
https://github.com/cschaufler/lsm-stacking.git#stack-5.10-rc1-v22
Signed-off-by: Casey Schaufler <casey(a)schaufler-ca.com>
---
3 years, 11 months
[RFC PATCH] audit-testsuite: tests for subject and object correctness
by Casey Schaufler
Verify that there are subj= and obj= fields in a record
if and only if they are expected. A system without a security
module that provides these fields should not include them.
A system with multiple security modules providing these fields
(e.g. SELinux and AppArmor) should always provide "?" for the
data and also include a AUDIT_MAC_TASK_CONTEXTS or
AUDIT_MAC_OBJ_CONTEXTS record. The test uses the LSM list from
/sys/kernel/security/lsm to determine which format is expected.
Signed-off-by: Casey Schaufler <casey(a)schaufler-ca.com>
---
tests/Makefile | 1 +
tests/multiple_contexts/Makefile | 12 +++
tests/multiple_contexts/test | 166 +++++++++++++++++++++++++++++++
3 files changed, 179 insertions(+)
create mode 100644 tests/multiple_contexts/Makefile
create mode 100755 tests/multiple_contexts/test
diff --git a/tests/Makefile b/tests/Makefile
index a7f242a..f20f6b1 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -18,6 +18,7 @@ TESTS := \
file_create \
file_delete \
file_rename \
+ multiple_contexts \
filter_exclude \
filter_saddr_fam \
filter_sessionid \
diff --git a/tests/multiple_contexts/Makefile b/tests/multiple_contexts/Makefile
new file mode 100644
index 0000000..c2a8e87
--- /dev/null
+++ b/tests/multiple_contexts/Makefile
@@ -0,0 +1,12 @@
+#
+# Copyright (C) Intel Corporation, 2020
+#
+
+TARGETS=$(patsubst %.c,%,$(wildcard *.c))
+
+LDLIBS += -lpthread
+
+all: $(TARGETS)
+clean:
+ rm -f $(TARGETS)
+
diff --git a/tests/multiple_contexts/test b/tests/multiple_contexts/test
new file mode 100755
index 0000000..c9afed5
--- /dev/null
+++ b/tests/multiple_contexts/test
@@ -0,0 +1,166 @@
+#!/usr/bin/perl
+#
+# Copyright (C) Intel Corporation, 2020
+#
+
+use strict;
+
+use Test;
+BEGIN { plan tests => 3 }
+
+use File::Temp qw/ tempdir tempfile /;
+
+###
+# functions
+
+sub key_gen {
+ my @chars = ( "A" .. "Z", "a" .. "z" );
+ my $key = "testsuite-" . time . "-";
+ $key .= $chars[ rand @chars ] for 1 .. 8;
+ return $key;
+}
+
+###
+# setup
+
+# reset audit
+system("auditctl -D >& /dev/null");
+
+my $line;
+my $lsm_out;
+my $lsm_count = 0;
+my $bpf_enabled = 0;
+
+open($lsm_out, "cat /sys/kernel/security/lsm |");
+while ( $line = <$lsm_out> ) {
+ if ( $line =~ /selinux/ ) {
+ $lsm_count = $lsm_count + 1;
+ }
+ if ( $line =~ /smack/ ) {
+ $lsm_count = $lsm_count + 1;
+ }
+ if ( $line =~ /apparmor/ ) {
+ $lsm_count = $lsm_count + 1;
+ }
+ if ( $line =~ /bpf/ ) {
+ $bpf_enabled = 1;
+ }
+}
+close($lsm_out);
+
+if ( $lsm_count and $bpf_enabled ) {
+ $lsm_count = $lsm_count + 1;
+}
+# create temp directory
+my $dir = tempdir( TEMPLATE => '/tmp/audit-testsuite-XXXX', CLEANUP => 1 );
+
+# create stdout/stderr sinks
+( my $fh_out, my $stdout ) = tempfile(
+ TEMPLATE => '/tmp/audit-testsuite-out-XXXX',
+ UNLINK => 1
+);
+( my $fh_err, my $stderr ) = tempfile(
+ TEMPLATE => '/tmp/audit-testsuite-err-XXXX',
+ UNLINK => 1
+);
+
+###
+# tests
+
+# create a test file
+( my $fh, my $filename ) =
+ tempfile( TEMPLATE => $dir . "/file-XXXX", UNLINK => 1 );
+
+# set the directory watch
+my $key = key_gen();
+system("auditctl -w $dir -k $key");
+
+# delete file
+unlink($filename);
+
+# make sure the records had a chance to bubble through to the logs
+system("auditctl -m syncmarker-$key");
+for ( my $i = 0 ; $i < 10 ; $i++ ) {
+ if ( system("ausearch -m USER | grep -q syncmarker-$key") eq 0 ) {
+ last;
+ }
+ sleep(0.2);
+}
+
+# test if we generate any audit records from the watch
+my $result = system("ausearch -i -k $key > $stdout 2> $stderr");
+ok( $result, 0 );
+
+# test if we generate a MAC_TASK_CONTEXTS record if and
+# only if it is required.
+#
+# test if we generate a MAC_OBJ_CONTEXTS record if and
+# only if it is required.
+
+my $found_auxsubj = 0;
+my $found_subjattr = 0;
+my $found_regsubj = 0;
+
+my $found_auxobj = 0;
+my $found_objattr = 0;
+my $found_regobj = 0;
+
+while ( $line = <$fh_out> ) {
+
+ if ( $line =~ / subj=\? / ) {
+ $found_auxsubj = 1;
+ } elsif ( $line =~ / subj=/ ) {
+ $found_regsubj = 1;
+ }
+ if ( $line =~ / subj_selinux=/ ) {
+ $found_subjattr = 1;
+ }
+ if ( $line =~ / subj_apparmor=/ ) {
+ $found_subjattr = 1;
+ }
+ if ( $line =~ / subj_smack=/ ) {
+ $found_subjattr = 1;
+ }
+
+ if ( $line =~ / obj=\? / ) {
+ $found_auxobj = 1;
+ } elsif ( $line =~ / obj=/ ) {
+ $found_regobj = 1;
+ }
+ if ( $line =~ / obj_selinux=/ ) {
+ $found_objattr = 1;
+ }
+ if ( $line =~ / obj_apparmor=/ ) {
+ $found_objattr = 1;
+ }
+ if ( $line =~ / obj_smack=/ ) {
+ $found_objattr = 1;
+ }
+}
+
+# three cases:
+# no subj= field or MAC_TASK_CONTEXTS when no supplying LSM
+# subj=$value field, no MAC_TASK_CONTEXTS for exactly one supplying LSM
+# subj=? field and a MAC_TASK_CONTEXTS for more than one supplying LSM
+#
+if ($lsm_count == 0) {
+ ok($found_regsubj == 0 and $found_auxsubj == 0);
+} elsif ($lsm_count == 1) {
+ ok($found_regsubj and $found_auxsubj == 0);
+} else {
+ ok($found_subjattr and $found_auxsubj);
+}
+
+if ($lsm_count == 0) {
+ ok($found_regobj == 0 and $found_auxobj == 0);
+} elsif ($lsm_count == 1) {
+ ok($found_regobj and $found_auxobj == 0);
+} else {
+ ok($found_objattr and $found_auxobj);
+}
+
+###
+# cleanup
+
+system("auditctl -D >& /dev/null");
+
--
2.24.1
3 years, 11 months
[PATCH 00/34] fs: idmapped mounts
by Christian Brauner
Hey everyone,
I vanished for a little while to focus on this work here so sorry for
not being available by mail for a while.
Since quite a long time we have issues with sharing mounts between
multiple unprivileged containers with different id mappings, sharing a
rootfs between multiple containers with different id mappings, and also
sharing regular directories and filesystems between users with different
uids and gids. The latter use-cases have become even more important with
the availability and adoption of systemd-homed (cf. [1]) to implement
portable home directories.
The solutions we have tried and proposed so far include the introduction
of fsid mappings, a tiny overlay based filesystem, and an approach to
call override creds in the vfs. None of these solutions have covered all
of the above use-cases.
The solution proposed here has it's origins in multiple discussions
during Linux Plumbers 2017 during and after the end of the containers
microconference.
To the best of my knowledge this involved Aleksa, Stéphane, Eric, David,
James, and myself. A variant of the solution proposed here has also been
discussed, again to the best of my knowledge, after a Linux conference
in St. Petersburg in Russia between Christoph, Tycho, and myself in 2017
after Linux Plumbers.
I've taken the time to finally implement a working version of this
solution over the last weeks to the best of my abilities. Tycho has
signed up for this sligthly crazy endeavour as well and he has helped
with the conversion of the xattr codepaths.
The core idea is to make idmappings a property of struct vfsmount
instead of tying it to a process being inside of a user namespace which
has been the case for all other proposed approaches.
It means that idmappings become a property of bind-mounts, i.e. each
bind-mount can have a separate idmapping. This has the obvious advantage
that idmapped mounts can be created inside of the initial user
namespace, i.e. on the host itself instead of requiring the caller to be
located inside of a user namespace. This enables such use-cases as e.g.
making a usb stick available in multiple locations with different
idmappings (see the vfat port that is part of this patch series).
The vfsmount struct gains a new struct user_namespace member. The
idmapping of the user namespace becomes the idmapping of the mount. A
caller that is either privileged with respect to the user namespace of
the superblock of the underlying filesystem or a caller that is
privileged with respect to the user namespace a mount has been idmapped
with can create a new bind-mount and mark it with a user namespace. The
user namespace the mount will be marked with can be specified by passing
a file descriptor refering to the user namespace as an argument to the
new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag.
By default vfsmounts are marked with the initial user namespace and no
behavioral or performance changes should be observed. All mapping
operations are nops for the initial user namespace.
When a file/inode is accessed through an idmapped mount the i_uid and
i_gid of the inode will be remapped according to the user namespace the
mount has been marked with. When a new object is created based on the
fsuid and fsgid of the caller they will similarly be remapped according
to the user namespace of the mount they care created from.
This means the user namespace of the mount needs to be passed down into
a few relevant inode_operations. This mostly includes inode operations
that create filesystem objects or change file attributes. Some of them
such as ->getattr() don't even need to change since they pass down a
struct path and thus the struct vfsmount is already available. Other
inode operations need to be adapted to pass down the user namespace the
vfsmount has been marked with. Al was nice enough to point out that he
will not tolerate struct vfsmount being passed to filesystems and that I
should pass down the user namespace directly; which is what I did.
The inode struct itself is never altered whenever the i_uid and i_gid
need to be mapped, i.e. i_uid and i_gid are only remapped at the time of
the check. An inode once initialized (during lookup or object creation)
is never altered when accessed through an idmapped mount.
To limit the amount of noise in this first iteration we have not changed
the existing inode operations but rather introduced a few new struct
inode operation methods such as ->mkdir_mapped which pass down the user
namespace of the mount they have been called from. Should this solution
be worth pursuing we have no problem adapting the existing inode
operations instead.
In order to support idmapped mounts, filesystems need to be changed and
mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. In this first
iteration I tried to illustrate this by changing three different
filesystem with different levels of complexity. Of course with some bias
towards urgent use-cases and filesystems I was at least a little more
familiar with. However, Tycho and I (and others) have no problem
converting each filesystem one-by-one. This first iteration includes fat
(msdos and vfat), ext4, and overlayfs (both with idmapped lower and
upper directories and idmapped merged directories). I'm sure I haven't
gotten everything right for all three of them in the first version of
this patch.
I have written a simple tool that allows to create idmapped mounts so
people can play with this patch series. Here are a few illustrations:
1. Create a simple idmapped mount of another user's home directory
u1001@f2-vm:/$ sudo ./mount-idmapped --map-mount b:1000:1001:1 /home/ubuntu/ /mnt
u1001@f2-vm:/$ ls -al /home/ubuntu/
total 28
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
-rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ ls -al /mnt/
total 28
drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
-rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
-rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ touch /mnt/my-file
u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
u1001@f2-vm:/$ ls -al /mnt/my-file
-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
u1001@f2-vm:/$ getfacl /mnt/my-file
getfacl: Removing leading '/' from absolute path names
# file: mnt/my-file
# owner: u1001
# group: u1001
user::rw-
user:u1001:rwx
group::rw-
mask::rwx
other::r--
u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
getfacl: Removing leading '/' from absolute path names
# file: home/ubuntu/my-file
# owner: ubuntu
# group: ubuntu
user::rw-
user:ubuntu:rwx
group::rw-
mask::rwx
other::r--
2. Create mapping of the whole ext4 rootfs without a mapping for uid and gid 0
ubuntu@f2-vm:~$ sudo /mount-idmapped --map-mount b:1:1:65536 / /mnt/
ubuntu@f2-vm:~$ findmnt | grep mnt
└─/mnt /dev/sda2 ext4 rw,relatime
└─/mnt/mnt /dev/sda2 ext4 rw,relatime
ubuntu@f2-vm:~$ sudo mkdir /AS-ROOT-CAN-CREATE
ubuntu@f2-vm:~$ sudo mkdir /mnt/AS-ROOT-CANT-CREATE
mkdir: cannot create directory ‘/mnt/AS-ROOT-CANT-CREATE’: Value too large for defined data type
ubuntu@f2-vm:~$ mkdir /mnt/home/ubuntu/AS-USER-1000-CAN-CREATE
3. Create a vfat usb mount and expose to user 1001 and 5000
ubuntu@f2-vm:/$ sudo mount /dev/sdb /mnt
ubuntu@f2-vm:/$ findmnt | grep mnt
└─/mnt /dev/sdb vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro
ubuntu@f2-vm:/$ ls -al /mnt
total 12
drwxr-xr-x 2 root root 4096 Jan 1 1970 .
drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
-rwxr-xr-x 1 root root 4 Oct 28 03:44 aaa
-rwxr-xr-x 1 root root 0 Oct 28 01:09 bbb
ubuntu@f2-vm:/$ sudo /mount-idmapped --map-mount b:0:1001:1 /mnt /mnt-1001/
ubuntu@f2-vm:/$ ls -al /mnt-1001/
total 12
drwxr-xr-x 2 u1001 u1001 4096 Jan 1 1970 .
drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
-rwxr-xr-x 1 u1001 u1001 4 Oct 28 03:44 aaa
-rwxr-xr-x 1 u1001 u1001 0 Oct 28 01:09 bbb
ubuntu@f2-vm:/$ sudo /mount-idmapped --map-mount b:0:5000:1 /mnt /mnt-5000/
ubuntu@f2-vm:/$ ls -al /mnt-5000/
total 12
drwxr-xr-x 2 5000 5000 4096 Jan 1 1970 .
drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
-rwxr-xr-x 1 5000 5000 4 Oct 28 03:44 aaa
-rwxr-xr-x 1 5000 5000 0 Oct 28 01:09 bbb
4. Create an idmapped rootfs mount for a container
root@f2-vm:~# ls -al /var/lib/lxc/f2/rootfs/
total 68
drwxr-xr-x 17 20000 20000 4096 Sep 24 07:48 .
drwxrwx--- 3 20000 20000 4096 Oct 16 19:26 ..
lrwxrwxrwx 1 20000 20000 7 Sep 24 07:43 bin -> usr/bin
drwxr-xr-x 2 20000 20000 4096 Apr 15 2020 boot
drwxr-xr-x 3 20000 20000 4096 Oct 16 19:26 dev
drwxr-xr-x 61 20000 20000 4096 Oct 16 19:26 etc
drwxr-xr-x 3 20000 20000 4096 Sep 24 07:45 home
lrwxrwxrwx 1 20000 20000 7 Sep 24 07:43 lib -> usr/lib
lrwxrwxrwx 1 20000 20000 9 Sep 24 07:43 lib32 -> usr/lib32
lrwxrwxrwx 1 20000 20000 9 Sep 24 07:43 lib64 -> usr/lib64
lrwxrwxrwx 1 20000 20000 10 Sep 24 07:43 libx32 -> usr/libx32
drwxr-xr-x 2 20000 20000 4096 Sep 24 07:43 media
drwxr-xr-x 2 20000 20000 4096 Sep 24 07:43 mnt
drwxr-xr-x 2 20000 20000 4096 Sep 24 07:43 opt
drwxr-xr-x 2 20000 20000 4096 Apr 15 2020 proc
drwx------ 2 20000 20000 4096 Sep 24 07:43 root
drwxr-xr-x 2 20000 20000 4096 Sep 24 07:45 run
lrwxrwxrwx 1 20000 20000 8 Sep 24 07:43 sbin -> usr/sbin
drwxr-xr-x 2 20000 20000 4096 Sep 24 07:43 srv
drwxr-xr-x 2 20000 20000 4096 Apr 15 2020 sys
drwxrwxrwt 2 20000 20000 4096 Sep 24 07:44 tmp
drwxr-xr-x 13 20000 20000 4096 Sep 24 07:43 usr
drwxr-xr-x 12 20000 20000 4096 Sep 24 07:44 var
root@f2-vm:~# /mount-idmapped --map-mount b:20000:10000:100000 /var/lib/lxc/f2/rootfs/ /mnt
root@f2-vm:~# ls -al /mnt
total 68
drwxr-xr-x 17 10000 10000 4096 Sep 24 07:48 .
drwxr-xr-x 34 root root 4096 Oct 28 22:24 ..
lrwxrwxrwx 1 10000 10000 7 Sep 24 07:43 bin -> usr/bin
drwxr-xr-x 2 10000 10000 4096 Apr 15 2020 boot
drwxr-xr-x 3 10000 10000 4096 Oct 16 19:26 dev
drwxr-xr-x 61 10000 10000 4096 Oct 16 19:26 etc
drwxr-xr-x 3 10000 10000 4096 Sep 24 07:45 home
lrwxrwxrwx 1 10000 10000 7 Sep 24 07:43 lib -> usr/lib
lrwxrwxrwx 1 10000 10000 9 Sep 24 07:43 lib32 -> usr/lib32
lrwxrwxrwx 1 10000 10000 9 Sep 24 07:43 lib64 -> usr/lib64
lrwxrwxrwx 1 10000 10000 10 Sep 24 07:43 libx32 -> usr/libx32
drwxr-xr-x 2 10000 10000 4096 Sep 24 07:43 media
drwxr-xr-x 2 10000 10000 4096 Sep 24 07:43 mnt
drwxr-xr-x 2 10000 10000 4096 Sep 24 07:43 opt
drwxr-xr-x 2 10000 10000 4096 Apr 15 2020 proc
drwx------ 2 10000 10000 4096 Sep 24 07:43 root
drwxr-xr-x 2 10000 10000 4096 Sep 24 07:45 run
lrwxrwxrwx 1 10000 10000 8 Sep 24 07:43 sbin -> usr/sbin
drwxr-xr-x 2 10000 10000 4096 Sep 24 07:43 srv
drwxr-xr-x 2 10000 10000 4096 Apr 15 2020 sys
drwxrwxrwt 2 10000 10000 4096 Sep 24 07:44 tmp
drwxr-xr-x 13 10000 10000 4096 Sep 24 07:43 usr
drwxr-xr-x 12 10000 10000 4096 Sep 24 07:44 var
root@f2-vm:~# lxc-start f2 # uses /mnt as rootfs
root@f2-vm:~# lxc-attach f2 -- cat /proc/1/uid_map
0 10000 10000
root@f2-vm:~# lxc-attach f2 -- cat /proc/1/gid_map
0 10000 10000
root@f2-vm:~# lxc-attach f2 -- ls -al /
total 52
drwxr-xr-x 17 root root 4096 Sep 24 07:48 .
drwxr-xr-x 17 root root 4096 Sep 24 07:48 ..
lrwxrwxrwx 1 root root 7 Sep 24 07:43 bin -> usr/bin
drwxr-xr-x 2 root root 4096 Apr 15 2020 boot
drwxr-xr-x 5 root root 500 Oct 28 23:39 dev
drwxr-xr-x 61 root root 4096 Oct 28 23:39 etc
drwxr-xr-x 3 root root 4096 Sep 24 07:45 home
lrwxrwxrwx 1 root root 7 Sep 24 07:43 lib -> usr/lib
lrwxrwxrwx 1 root root 9 Sep 24 07:43 lib32 -> usr/lib32
lrwxrwxrwx 1 root root 9 Sep 24 07:43 lib64 -> usr/lib64
lrwxrwxrwx 1 root root 10 Sep 24 07:43 libx32 -> usr/libx32
drwxr-xr-x 2 root root 4096 Sep 24 07:43 media
drwxr-xr-x 2 root root 4096 Sep 24 07:43 mnt
drwxr-xr-x 2 root root 4096 Sep 24 07:43 opt
dr-xr-xr-x 232 nobody nogroup 0 Oct 28 23:39 proc
drwx------ 2 root root 4096 Oct 28 23:41 root
drwxr-xr-x 12 root root 360 Oct 28 23:39 run
lrwxrwxrwx 1 root root 8 Sep 24 07:43 sbin -> usr/sbin
drwxr-xr-x 2 root root 4096 Sep 24 07:43 srv
dr-xr-xr-x 13 nobody nogroup 0 Oct 28 23:39 sys
drwxrwxrwt 11 root root 4096 Oct 28 23:40 tmp
drwxr-xr-x 13 root root 4096 Sep 24 07:43 usr
drwxr-xr-x 12 root root 4096 Sep 24 07:44 var
root@f2-vm:~# lxc-attach f2 -- ls -al /my-file
-rw-r--r-- 1 root root 0 Oct 28 23:43 /my-file
root@f2-vm:~# ls -al /var/lib/lxc/f2/rootfs/my-file
-rw-r--r-- 1 20000 20000 0 Oct 28 23:43 /var/lib/lxc/f2/rootfs/my-file
[1]: https://systemd.io/HOME_DIRECTORY/
"If the UID assigned to a user does not match the owner of the home
directory in the file system, the home directory is automatically
and recursively chown()ed to the correct UID."
This has huge performance impact and is also problematic since it
chowns all files independent of ownership.
[2]: https://github.com/brauner/mount-idmapped
In no particular order I'd like to say thanks to:
Al for pointing me into the direction to avoid inode alias issues during
lookup. David for various discussions around this. Tycho for helping
with this series and on future patches if this is in any shape or form
acceptable. Alban Crequy for pointing out more application container
use-cases. Stéphane for various valuable input on various use-cases and
letting me work on this. Amir for explaining and discussing aspects of
overlayfs with me.
I'd like to especially thank Seth Forshee because he provided a lot of
good analysis, suggestions, and participated in short-notice discussions
in both chat and video.
This series can be found and pulled in three locations:
https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=...
https://github.com/brauner/linux/tree/idmapped_mounts
https://gitlab.com/brauner/linux/-/commits/idmapped_mounts
Thanks!
Christian
Christian Brauner (32):
namespace: take lock_mount_hash() directly when changing flags
namespace: only take read lock in do_reconfigure_mnt()
fs: add mount_setattr()
tests: add mount_setattr() selftests
fs: introduce MOUNT_ATTR_IDMAP
fs: add id translation helpers
capability: handle idmapped mounts
namei: add idmapped mount aware permission helpers
inode: add idmapped mount aware init and permission helpers
attr: handle idmapped mounts
acl: handle idmapped mounts
commoncap: handle idmapped mounts
stat: add mapped_generic_fillattr()
namei: handle idmapped mounts in may_*() helpers
namei: introduce struct renamedata
namei: prepare for idmapped mounts
namei: add lookup helpers with idmapped mounts aware permission
checking
open: handle idmapped mounts in do_truncate()
open: handle idmapped mounts
af_unix: handle idmapped mounts
utimes: handle idmapped mounts
would_dump: handle idmapped mounts
exec: handle idmapped mounts
fs: add helpers for idmap mounts
apparmor: handle idmapped mounts
audit: handle idmapped mounts
ima: handle idmapped mounts
ext4: support idmapped mounts
expfs: handle idmapped mounts
overlayfs: handle idmapped lower directories
overlayfs: handle idmapped merged mounts
fat: handle idmapped mounts
Tycho Andersen (2):
xattr: handle idmapped mounts
selftests: add idmapped mounts xattr selftest
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/Kconfig | 6 +
fs/attr.c | 142 ++-
fs/coredump.c | 12 +-
fs/exec.c | 12 +-
fs/exportfs/expfs.c | 4 +-
fs/ext4/acl.c | 11 +-
fs/ext4/acl.h | 3 +
fs/ext4/ext4.h | 14 +-
fs/ext4/file.c | 4 +
fs/ext4/ialloc.c | 7 +-
fs/ext4/inode.c | 27 +-
fs/ext4/ioctl.c | 18 +-
fs/ext4/namei.c | 145 ++-
fs/ext4/super.c | 4 +
fs/ext4/symlink.c | 9 +
fs/ext4/xattr_hurd.c | 22 +-
fs/ext4/xattr_security.c | 18 +-
fs/ext4/xattr_trusted.c | 18 +-
fs/fat/fat.h | 2 +
fs/fat/file.c | 27 +-
fs/fat/namei_msdos.c | 7 +
fs/fat/namei_vfat.c | 7 +
fs/inode.c | 66 +-
fs/internal.h | 9 +
fs/namei.c | 597 ++++++++----
fs/namespace.c | 446 ++++++++-
fs/open.c | 52 +-
fs/overlayfs/copy_up.c | 104 +-
fs/overlayfs/dir.c | 219 +++--
fs/overlayfs/export.c | 3 +-
fs/overlayfs/file.c | 23 +-
fs/overlayfs/inode.c | 121 ++-
fs/overlayfs/namei.c | 64 +-
fs/overlayfs/overlayfs.h | 158 +++-
fs/overlayfs/ovl_entry.h | 1 +
fs/overlayfs/readdir.c | 34 +-
fs/overlayfs/super.c | 109 ++-
fs/overlayfs/util.c | 38 +-
fs/posix_acl.c | 130 ++-
fs/stat.c | 18 +-
fs/utimes.c | 4 +-
fs/xattr.c | 264 ++++--
include/linux/audit.h | 10 +-
include/linux/capability.h | 12 +-
include/linux/fs.h | 254 ++++-
include/linux/ima.h | 15 +-
include/linux/lsm_hook_defs.h | 10 +-
include/linux/lsm_hooks.h | 1 +
include/linux/mount.h | 20 +-
include/linux/namei.h | 6 +
include/linux/posix_acl.h | 14 +-
include/linux/posix_acl_xattr.h | 12 +-
include/linux/security.h | 36 +-
include/linux/syscalls.h | 3 +
include/linux/xattr.h | 29 +
include/uapi/asm-generic/unistd.h | 4 +-
include/uapi/linux/mount.h | 26 +
ipc/mqueue.c | 8 +-
kernel/auditsc.c | 29 +-
kernel/capability.c | 22 +-
net/unix/af_unix.c | 2 +-
security/apparmor/domain.c | 9 +-
security/apparmor/file.c | 5 +-
security/apparmor/lsm.c | 12 +-
security/commoncap.c | 50 +-
security/integrity/ima/ima.h | 19 +-
security/integrity/ima/ima_api.c | 10 +-
security/integrity/ima/ima_appraise.c | 14 +-
security/integrity/ima/ima_asymmetric_keys.c | 2 +-
security/integrity/ima/ima_main.c | 28 +-
security/integrity/ima/ima_policy.c | 17 +-
security/integrity/ima/ima_queue_keys.c | 2 +-
security/security.c | 18 +-
security/selinux/hooks.c | 13 +-
security/smack/smack_lsm.c | 11 +-
tools/include/uapi/asm-generic/unistd.h | 4 +-
tools/testing/selftests/Makefile | 1 +
.../testing/selftests/idmap_mounts/.gitignore | 1 +
tools/testing/selftests/idmap_mounts/Makefile | 8 +
tools/testing/selftests/idmap_mounts/config | 1 +
tools/testing/selftests/idmap_mounts/xattr.c | 389 ++++++++
.../selftests/mount_setattr/.gitignore | 1 +
.../testing/selftests/mount_setattr/Makefile | 7 +
tools/testing/selftests/mount_setattr/config | 1 +
.../mount_setattr/mount_setattr_test.c | 888 ++++++++++++++++++
102 files changed, 4109 insertions(+), 912 deletions(-)
create mode 100644 tools/testing/selftests/idmap_mounts/.gitignore
create mode 100644 tools/testing/selftests/idmap_mounts/Makefile
create mode 100644 tools/testing/selftests/idmap_mounts/config
create mode 100644 tools/testing/selftests/idmap_mounts/xattr.c
create mode 100644 tools/testing/selftests/mount_setattr/.gitignore
create mode 100644 tools/testing/selftests/mount_setattr/Makefile
create mode 100644 tools/testing/selftests/mount_setattr/config
create mode 100644 tools/testing/selftests/mount_setattr/mount_setattr_test.c
base-commit: 3650b228f83adda7e5ee532e2b90429c03f7b9ec
--
2.29.0
3 years, 12 months