On January 17, 2021 4:12:44 PM Steve Grubb <sgrubb(a)redhat.com> wrote:
On Sunday, January 17, 2021 9:07:08 AM EST Paul Moore wrote:
> On Fri, Jan 15, 2021 at 9:43 PM Burn Alting <burn.alting(a)iinet.net.au>
wrote:
>> On Fri, 2021-01-15 at 19:35 -0500, Richard Guy Briggs wrote:
>>> Or we go back to userspace code looking for the EOE record? This
>>> doesn't help if they arrive out of order. Do we number the records in
>>> the kernel? N of M...
>>
>> I like the N of M concept but there would be a LOT of change - especially
>> for all the non-kernel event sources. The EOE would be the most
>> seamless, but at a cost. My preference is to allow the 2 second 'timer'
>> to be configurable.
>
> Agree with Burn, numbering the records coming up from the kernel is
> going to be a real nightmare, and not something to consider lightly.
> Especially when it sounds like we don't yet have a root cause for the
> issue.
A very long time ago, we had numbered records. But it was decided that
there's no real point in it and we'd rather just save disk space.
With the current kernel code, adding numbered records is not something to take lightly.
I know that the kernel does not serialize the events headed for user
space.
But I'm curious how an event gets stuck and others can jump ahead while one
that's already inflight can get hung for 4 seconds before it's next record
goes out?
Have you determined that the problem is the kernel? Initially it was looking like it was a
userspace issue, is that no longer the general thought? Also, is there a reliable
reproducer yet?
--
paul moore
www.paul-moore.com