I want to reactivate this thread of discussion to come to a closure on this subject and implement the solution as fast as possible.
First, I will summarize what we are trying to do. Then I will state where we left off, I will restate the original proposal and the responses to the original proposal. Finally, I will provide another proposal to start the discussion. This is a long note.

The Background

One of the CAPP requirements and probably the LSPP as well is when audit records cannot be generated, for a particular process, the process need to be halted. We are considering 2 separate cases when the audit records cannot be generated.
1) The first, is when the audit log is full and the audit subsystem cannot write the audit record.
2) The second, is when the kernel cannot allocate memory to generate the audit buffer.

One of the reasons these 2 cases are treated separately is because for the first case (disk full), the audit subsystem can know ahead of time that the audit record cannot be written out to disk (auditd can for example send a message informing the audit system of this situation). So, the audit subsystem has the ability to suspend the process (or all the processes) before they perform auditable action(s). In contrast, for the second case (no kernel resources) the audit subsystem cannot know ahead of time that the kernel resources are exhausted. It is only, when the audit subsystem is trying to generate the audit record, when it discovers that no resources are available. The auditable action already took place.

The Initial proposal

1) For handling disk full: Whenever the disk full (or log reached its limit) is detected the auditd sends an AUDIT_SUSPEND message to the kernel. On receipt of this message the kernel will set a flag "disk_full_flag". If this disk_full_flag is set audit_log_start will call audit_suspend to queue the process in a wait queue. Whenever the disk_full_flag is reset all the processes in the wait queue will be rescheduled. 2) For suspending the process whenever there are no kernel resources: I was thinking of using sigsuspend whenever audit_log_lost is called depending on the "failure flag". The failure flag currently can be set only, to: i) do nothing, ii) print a message or iii) panic. I was thinking of adding a fourth option to this flag to suspend the processes.

Responses to the initial proposal

1) Should not change the audit_log* functions because they can be called from different context(Chris White).
This can only safely be done from either: a) audit_syscall_exit, or b) some new audit_log* functions that are explicitly identified as potentially blocking.
(Stephen Smalley)
2) Sigsuspend is not safe (Stephan Smalley).
There may not be any local process associated with the event, e.g. SELinux can generate audit data during processing of received packets.
the processing you describe can only safely be done from either: a) audit_syscall_exit, or b) some new audit_log* functions that are explicitly identified as potentially blocking.

Current proposal

1) For handling disk full:
Instead of calling the audit_suspend from audit_log_start. I will call it from audit_syscall_entry
if the context is auditable. audit_suspend will place the process in a wait_queue until the disk_full_flag is reset. At that time all the processes in the wait queue will be awakened.

Hopefully this is acceptable and I can go ahead and implement this.

The question is how SELinux should treat its audit records in this case? For the current CAPP work this is not an issue. However, it will be for LSPP evaluation.

2) For suspending the process whenever there are no kernel resources:
Audit_log_lost is called from many places for many reasons no memory, socket is busy, etc.. I need to think a little bit more about this. If we don't want to sleep in any audit_log* function. Any suggestion?

Mounir Bsaibes
Linux Security
Tel: (512) 838-1301
Cell: (512) 762-9957
Fax: (512) 838-8858
e-mail: bsaibes@us.ibm.com