Alexander Viro <aviro(a)redhat.com> writes:
c) just how do you propose to do "tracking file
descriptors"?
We aren't proposing to track file descriptors. We already have code
that does that. Currently, we collect the traces with a modified
version of strace, but for a variety of reasons, autrace would be a
much better source of trace data. First, we have to modify strace so
it includes security contexts on labeled objects. Second, strace
output is designed to be easily consumed by humans, and is a bear to
get a program to understand it. You can see the AWK/YACC/sed pipeline
required to put strace output into a form that can be easily consumed
by a python program by reading the source file
polgen/src/trackfd/trackstrace.in in the polgen CVS repository at
http://sf.net/projects/polgen. You'll quickly realize why I am eager
for the parsing library to be introduced in audit 1.3.
The program that does the analysis is in polgen/src/trackfd/trackfd.py.
It analyzes the records for the following system calls:
close open socket pipe socketpair dup dup2 fcntl64 read write bind
accept connect recv send unlink execve clone
The key thing is the program doesn't really track file descriptors,
instead it tracks what they refer to. The program generates a data
structure when each file descriptor is created via an open or socket
system call. The system calls dup, fcntl, close, and execve change
mappings of file descriptors to the data structure, and a close system
call causes a summary of reads and writes to be written. Here is the
summary of the file descriptor tracker from the polgen document.
An essential part of the data reduction is the summarization of the
life cycle of a file descriptor. For each file descriptor created
by a program, the @code{trackfd} program creates a data structure.
The data structure is updated whenever a system call is found that
applies to the file descriptor. Finally, when a file descriptor is
closed, a summary of the activity associated with the file
descriptor is written to the output.
It's one of those programs that is either correct, or explodes and
dies horribly when a bug is exercised. On Fedora Core, the program
has been quite solid for quite a while now. I has been used to
analyze the Jabber server and an application running in a Java Virtual
Machine.
By the way, we do not claim to handle every possible path for
information flow yet. In fact, we ignore all system calls implemented
by the ipc common kernel entry point. Our experience is that the
current set of system calls we analyze handles a large number
important target applications.
The trackfd.py file is about 600 lines of code. I tried to make it
easily read, in case someone took the time to proof read my code.
John