Re: [PATCH v28 22/25] Audit: Add record for multiple process LSM attributes

Casey Schaufler <casey@xxxxxxxxxxxxxxxx> · Fri, 13 Aug 2021 11:48:26 -0700

On 8/13/2021 8:31 AM, Paul Moore wrote:
> On Thu, Aug 12, 2021 at 6:38 PM Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote:
>> On 8/12/2021 1:59 PM, Paul Moore wrote:
>>> On Wed, Jul 21, 2021 at 9:12 PM Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote:
>>>> Create a new audit record type to contain the subject information
>>>> when there are multiple security modules that require such data.
> ...
>
>>> The local
>>> audit context is a hack that is made necessary by the fact that we
>>> have to audit things which happen outside the scope of an executing
>>> task, e.g. the netfilter audit hooks, it should *never* be used when
>>> there is a valid task_struct.
>> In the existing audit code a "current context" is only needed for
>> syscall events, so that's the only case where it's allocated. Would
>> you suggest that I track down the non-syscall events that include
>> subj= fields and add allocate a "current context" for them? I looked
>> into doing that, and it wouldn't be simple.
> This is why the "local context" was created.  Prior to these stacking
> additions, and the audit container ID work, we never needed to group
> multiple audit records outside of a syscall context into a single
> audit event so passing a NULL context into audit_log_start() was
> reasonable.  The local context was designed as a way to generate a
> context for use in a local function scope to group multiple records,
> however, for reasons I'll get to below I'm now wondering if the local
> context approach is really workable ...

I haven't found a place where it didn't work. What is the concern?

>>> Hopefully that makes sense?
>> Yes, it makes sense. Methinks you may believe that the current context
>> is available more regularly than it actually is.
>>
>> I instrumented the audit event functions with:
>>
>>         WARN_ONCE(audit_context, "%s has context\n", __func__);
>>         WARN_ONCE(!audit_context, "%s lacks context\n", __func__);
>>
>> I only used local contexts where the 2nd WARN_ONCE was hit.
> What does your audit config look like?  Both the kernel command line
> and the output of 'auditctl -l' would be helpful.

On the fedora system:

BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-rc5stack+
root=/dev/mapper/fedora-root ro resume=/dev/mapper/fedora-swap
rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap lsm.debug

-a always,exit -F arch=b64 -S bpf -F key=testsuite-1628714321-EtlWIphW

On the Ubuntu system:

BOOT_IMAGE=/boot/vmlinuz-5.14.0-rc1stack+
root=UUID=39c25777-d413-4c2e-948c-dfa2bf259049 ro lsm.debug

No rules

> I'm beginning to suspect that you have the default
> we-build-audit-into-the-kernel-because-product-management-said-we-have-to-but-we-don't-actually-enable-it-at-runtime
> audit configuration that is de rigueur for many distros these days.

Yes, but I've also fiddled about with it so as to get better event coverage.
I've run the audit-testsuite, which has got to fiddle about with the audit
configuration.

> If that is the case, there are many cases where you would not see a
> NULL current->audit_context simply because the config never allocated
> one, see kernel/auditsc.c:audit_alloc().

I assume you mean that I *would* see a NULL current->audit_context
in the "event not enabled" case.

> If that is the case, I'm
> honestly a little surprised we didn't realize that earlier, especially
> given all the work/testing that Richard has done with the audit
> container ID bits, but then again he surely had a proper audit config
> during his testing so it wouldn't have appeared.
>
> Good times.

Indeed.

> Regardless, assuming that is the case we probably need to find an
> alternative to the local context approach as it currently works.  For
> reasons we already talked about, we don't want to use a local
> audit_context if there is the possibility for a proper
> current->audit_context, but we need to do *something* so that we can
> group these multiple events into a single record.

I tried a couple things, but neither was satisfactory.

> Since this is just occurring to me now I need a bit more time to think
> on possible solutions - all good ideas are welcome - but the first
> thing that pops into my head is that we need to augment
> audit_log_end() to potentially generated additional, associated
> records similar to what we do on syscall exit in audit_log_exit().

I looked into that. You need a place to save the timestamp
that doesn't disappear. That's the role the audit_context plays
now.

>   Of
> course the audit_log_end() changes would be much more limited than
> audit_log_exit(), just the LSM subject and audit container ID info,
> and even then we might want to limit that to cases where the ab->ctx
> value is NULL and let audit_log_exit() handle it otherwise.  We may
> need to store the event type in the audit_buffer during
> audit_log_start() so that we can later use that in audit_log_end() to
> determine what additional records are needed.
>
> Regardless, let's figure out why all your current->audit_context
> values are NULL

That's what's maddening, and why I implemented audit_alloc_for_lsm().
They aren't all NULL. Sometimes current->audit_context is NULL,
sometimes it isn't, for the same event. I thought it might be a
question of the netlink interface being treated specially, but
that doesn't explain all the cases.

>  first (report back on your audit config please), I may
> be worrying about a hypothetical that isn't real.
>