On Monday, October 03, 2011 03:42:25 PM Casey Schaufler wrote:
On 10/1/2011 5:31 AM, Steve Grubb wrote:
> On Friday, September 16, 2011 08:12:15 PM John Feuerstein wrote:
>> I would like to audit all changes to a directory tree using the linux
>> auditing system[1].
>>
>> # auditctl -a exit,always -F dir=/etc/ -F perm=wa
>>
>> It seems like the GNU coreutils are enough to break the audit trail.
>
> I was hoping one of the kernel developers would have got involved with
> this question. I pointed out the same problem as you maybe 5 years ago.
> The people working on it at the time said that if you really want to
> know, just add events for opens and then you can piece it together. In
> my opinion, that is avoiding the problem and not solving it. There are
> way too many opens to put into an audit trail on the odd chance that you
> might have needed one. In 5 years, the kernel has changed and so have
> the people working on the code. Maybe this problem should be revisited.
Howdy. Kernel developer here.
Hi Casey,
Glad someone jumped in to this. :)
The problem goes way back. Way, way back. I will do my
best to describe what is going on and why the kernel has
such a problem with pathnames and audit. I am afraid that
you may not be happy with the explanation, but I also
think that you should understand what is going on and why
it has been so difficult to get a satisfactory resolution.
The Linux (and UNIX before it) kernel does not have an
internal concept of a path. Pathname resolution is provided
for the convenience of user space code.
The kernel has a simple view of filesystem objects. They
are inodes and datablocks. So long as there is a name for
the inode somewhere on the system the object is retained,
and once all the names are gone it is expunged. There are
two kinds of names; open file descriptors and directory
entries. A directory entry contains exactly one component
of a pathname. You are not allowed to remove directories
unless they are empty because that would leave objects with
names in an inaccessible state.
The Linux filesystem semantics, inherited in all their
glory from UNIX, permit multiple directory entries to
refer to the same inode. That means that there can be
multiple names for the same object in the filesystem
name space. These names are all peers. None is the "real"
name of the object. The only possible real name for the
object is the inode number (combined with an identification
of the containing filesystem). This identifies the object
even when all entries in the filesystem namespace are
gone but the file is open. Auditible event can occur on
files that are open but have not filesystem entries.
It's a big mess because the auditor obviously wants to
know the name of the file, but it is entirely possible
that there are hundreds of names in the filesystem space
for the object and that there are hundreds of open file
descriptors for the object, none of which were created
by opening pathnames that refer to that object any longer.
The kernel can keep track of the path used to reach an
inode, but with hard links, symlinks, mount points and
namespaces the reality is that you can't identify the
object involved using that information. The best that
can be done is to record the pathname requested, the
pathname resolved, and the inode number. It is impossible
to track objects by pathname because the pathname is
not a kernel concept.
It's been this way forever. UNIX audit systems had/have
the exact same problem. This is why we have AppArmor and
TOMOYO. Unless someone smarter than I am has an outstanding
insight we aren't going to make you happy any time soon.
What I was wondering about is this. Assumption: The openat audit event should not
happen all the time. If it does, then the system is going to perform poorly due to
load. So, if an event fires, we really aren't on the hotpath anymore. Proposal: If we
have an *at syscall event triggered, why can't we look at the fd being passed in and
look it up in the same place that the kernel keeps the path for the /proc/<pid>/fd/#
?
-Steve