Re: [PATCH ghak90 V8 07/16] audit: add contid support for signalling the audit daemon

Paul Moore <paul@xxxxxxxxxxxxxx> · Wed, 22 Apr 2020 13:24:10 -0400

On Fri, Apr 17, 2020 at 6:26 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> Paul Moore <paul@xxxxxxxxxxxxxx> writes:
> > On Thu, Apr 16, 2020 at 4:36 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> >> Paul Moore <paul@xxxxxxxxxxxxxx> writes:
> >> > On Mon, Mar 30, 2020 at 1:49 PM Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> >> >> On 2020-03-30 13:34, Paul Moore wrote:
> >> >> > On Mon, Mar 30, 2020 at 12:22 PM Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> >> >> > > On 2020-03-30 10:26, Paul Moore wrote:
> >> >> > > > On Mon, Mar 30, 2020 at 9:47 AM Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> >> >> > > > > On 2020-03-28 23:11, Paul Moore wrote:
> >> >> > > > > > On Tue, Mar 24, 2020 at 5:02 PM Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> >> >> > > > > > > On 2020-03-23 20:16, Paul Moore wrote:
> >> >> > > > > > > > On Thu, Mar 19, 2020 at 6:03 PM Richard Guy Briggs <rgb@xxxxxxxxxx> wrote:
> >> >> > > > > > > > > On 2020-03-18 18:06, Paul Moore wrote:
> >> >
> >> > ...
> >> >
> >> >> > > Well, every time a record gets generated, *any* record gets generated,
> >> >> > > we'll need to check for which audit daemons this record is in scope and
> >> >> > > generate a different one for each depending on the content and whether
> >> >> > > or not the content is influenced by the scope.
> >> >> >
> >> >> > That's the problem right there - we don't want to have to generate a
> >> >> > unique record for *each* auditd on *every* record.  That is a recipe
> >> >> > for disaster.
> >> >> >
> >> >> > Solving this for all of the known audit records is not something we
> >> >> > need to worry about in depth at the moment (although giving it some
> >> >> > casual thought is not a bad thing), but solving this for the audit
> >> >> > container ID information *is* something we need to worry about right
> >> >> > now.
> >> >>
> >> >> If you think that a different nested contid value string per daemon is
> >> >> not acceptable, then we are back to issuing a record that has only *one*
> >> >> contid listed without any nesting information.  This brings us back to
> >> >> the original problem of keeping *all* audit log history since the boot
> >> >> of the machine to be able to track the nesting of any particular contid.
> >> >
> >> > I'm not ruling anything out, except for the "let's just completely
> >> > regenerate every record for each auditd instance".
> >>
> >> Paul I am a bit confused about what you are referring to when you say
> >> regenerate every record.
> >>
> >> Are you saying that you don't want to repeat the sequence:
> >>         audit_log_start(...);
> >>         audit_log_format(...);
> >>         audit_log_end(...);
> >> for every nested audit daemon?
> >
> > If it can be avoided yes.  Audit performance is already not-awesome,
> > this would make it even worse.
>
> As far as I can see not repeating sequences like that is fundamental
> for making this work at all.  Just because only the audit subsystem
> should know about one or multiple audit daemons.  Nothing else should
> care.

Yes, exactly, this has been mentioned in the past.  Both the
performance hit and the code complication in the caller are things we
must avoid.

> >> Or are you saying that you would like to literraly want to send the same
> >> skb to each of the nested audit daemons?
> >
> > Ideally we would reuse the generated audit messages as much as
> > possible.  Less work is better.  That's really my main concern here,
> > let's make sure we aren't going to totally tank performance when we
> > have a bunch of nested audit daemons.
>
> So I think there are two parts of this answer.  Assuming we are talking
> about nesting audit daemons in containers we will have different
> rulesets and I expect most of the events for a nested audit daemon won't
> be of interest to the outer audit daemon.

Yes, this is another thing that Richard and I have discussed in the
past.  We will basically need to create per-daemon queues, rules,
tracking state, etc.; that is easy enough.  What will be slightly more
tricky is the part where we apply the filters to the individual
records and decide if that record is valid/desired for a given daemon.
I think it can be done without too much pain, and any changes to the
callers, but it will require a bit of work to make sure it is done
well and that records are needlessly duplicated in the kernel.

> Beyond that it should be very straight forward to keep a pointer and
> leave the buffer as a scatter gather list until audit_log_end
> and translate pids, and rewrite ACIDs attributes in audit_log_end
> when we build the final packet.  Either through collaboration with
> audit_log_format or a special audit_log command that carefully sets
> up the handful of things that need that information.

In order to maximize record re-use I think we will want to hold off on
assembling the final packet until it is sent to the daemons in the
kauditd thread.  We'll also likely need to create special
audit_log_XXX functions to capture fields which we know will need
translation, e.g. ACID information.  (the reason for the new
audit_log_XXX functions would be to mark the new sg element and ensure
the buffer is handled correctly)

Regardless of the details, I think the scatter gather approach is the
key here - that seems like the best design idea I've seen thus far.
It enables us to replace portions of the record as needed ... and
possibly use the existing skb cow stuff ... it has been a while, but
does the skb cow functions handle scatter gather skbs or do they need
to be linear?

> Hmm.  I am seeing that we send skbs to kauditd and then kauditd
> sends those skbs to userspace.  I presume that is primary so that
> sending messages to userspace does not block the process being audited.
> Plus a little bit so that the retry logic will work.

Long story short, it's a poor design.  I'm not sure who came up with
it, but I have about a 1000 questions that are variations on "why did
this seem like a good idea?".

I expect the audit_buffer definition to change significantly during
the nested auditd work.

> I think the naive implementation would be to simply have 1 kauditd
> per auditd (strictly and audit context/namespace).  Although that can be
> optimized if that is a problem.
>
> Beyond that I think we would need to look at profiles to really
> understand where the bottlenecks are.

Agreed.  This is a hidden implementation detail that doesn't affect
the userspace API or the in-kernel callers.  The first approach can be
simple and we can complicate it as needed in future versions.

> > I'm open to any ideas people may have.  We have a problem, let's solve
> > it.
>
> It definitely makes sense to look ahead to having audit daemons running
> in containers, but in the grand scheme of things that is a nice to have.
> Probably something we will and should get to, but we have lived a long
> time without auditd running in containers so I expect we can live a
> while longer.

It looks like you are confusing my concern.  I'm not pushing Richard
to implement support for this in the current patchset, I'm pushing
Richard to consider the design aspect of having multiple audit daemons
so that we don't code ourselves into a corner with the audit record
changes he is proposing.  The audit record format is part of the
kernel/userspace API and as a result requires great care when
modifying/extending/etc.

-- 
paul moore
www.paul-moore.com