Re: [PATCH v2] fs: try an opportunistic lookup for O_CREAT opens too

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 8, 2024 at 7:43 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> On Thu, 2024-08-08 at 17:12 -0400, Paul Moore wrote:
> > On Thu, Aug 8, 2024 at 1:11 PM Jan Kara <jack@xxxxxxx> wrote:
> > > On Thu 08-08-24 12:36:07, Christian Brauner wrote:
> > > > On Wed, Aug 07, 2024 at 10:36:58AM GMT, Jeff Layton wrote:
> > > > > On Wed, 2024-08-07 at 16:26 +0200, Christian Brauner wrote:
> > > > > > > +static struct dentry *lookup_fast_for_open(struct nameidata *nd, int open_flag)
> > > > > > > +{
> > > > > > > +       struct dentry *dentry;
> > > > > > > +
> > > > > > > +       if (open_flag & O_CREAT) {
> > > > > > > +               /* Don't bother on an O_EXCL create */
> > > > > > > +               if (open_flag & O_EXCL)
> > > > > > > +                       return NULL;
> > > > > > > +
> > > > > > > +               /*
> > > > > > > +                * FIXME: If auditing is enabled, then we'll have to unlazy to
> > > > > > > +                * use the dentry. For now, don't do this, since it shifts
> > > > > > > +                * contention from parent's i_rwsem to its d_lockref spinlock.
> > > > > > > +                * Reconsider this once dentry refcounting handles heavy
> > > > > > > +                * contention better.
> > > > > > > +                */
> > > > > > > +               if ((nd->flags & LOOKUP_RCU) && !audit_dummy_context())
> > > > > > > +                       return NULL;
> > > > > >
> > > > > > Hm, the audit_inode() on the parent is done independent of whether the
> > > > > > file was actually created or not. But the audit_inode() on the file
> > > > > > itself is only done when it was actually created. Imho, there's no need
> > > > > > to do audit_inode() on the parent when we immediately find that file
> > > > > > already existed. If we accept that then this makes the change a lot
> > > > > > simpler.
> > > > > >
> > > > > > The inconsistency would partially remain though. When the file doesn't
> > > > > > exist audit_inode() on the parent is called but by the time we've
> > > > > > grabbed the inode lock someone else might already have created the file
> > > > > > and then again we wouldn't audit_inode() on the file but we would have
> > > > > > on the parent.
> > > > > >
> > > > > > I think that's fine. But if that's bothersome the more aggressive thing
> > > > > > to do would be to pull that audit_inode() on the parent further down
> > > > > > after we created the file. Imho, that should be fine?...
> > > > > >
> > > > > > See https://gitlab.com/brauner/linux/-/commits/vfs.misc.jeff/?ref_type=heads
> > > > > > for a completely untested draft of what I mean.
> > > > >
> > > > > Yeah, that's a lot simpler. That said, my experience when I've worked
> > > > > with audit in the past is that people who are using it are _very_
> > > > > sensitive to changes of when records get emitted or not. I don't like
> > > > > this, because I think the rules here are ad-hoc and somewhat arbitrary,
> > > > > but keeping everything working exactly the same has been my MO whenever
> > > > > I have to work in there.
> > > > >
> > > > > If a certain access pattern suddenly generates a different set of
> > > > > records (or some are missing, as would be in this case), we might get
> > > > > bug reports about this. I'm ok with simplifying this code in the way
> > > > > you suggest, but we may want to do it in a patch on top of mine, to
> > > > > make it simple to revert later if that becomes necessary.
> > > >
> > > > Fwiw, even with the rearranged checks in v3 of the patch audit records
> > > > will be dropped because we may find a positive dentry but the path may
> > > > have trailing slashes. At that point we just return without audit
> > > > whereas before we always would've done that audit.
> > > >
> > > > Honestly, we should move that audit event as right now it's just really
> > > > weird and see if that works. Otherwise the change is somewhat horrible
> > > > complicating the already convoluted logic even more.
> > > >
> > > > So I'm appending the patches that I have on top of your patch in
> > > > vfs.misc. Can you (other as well ofc) take a look and tell me whether
> > > > that's not breaking anything completely other than later audit events?
> > >
> > > The changes look good as far as I'm concerned but let me CC audit guys if
> > > they have some thoughts regarding the change in generating audit event for
> > > the parent. Paul, does it matter if open(O_CREAT) doesn't generate audit
> > > event for the parent when we are failing open due to trailing slashes in
> > > the pathname? Essentially we are speaking about moving:
> > >
> > >         audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
> > >
> > > from open_last_lookups() into lookup_open().
> >
> > Thanks for adding the audit mailing list to the CC, Jan.  I would ask
> > for others to do the same when discussing changes that could impact
> > audit (similar requests for the LSM framework, SELinux, etc.).
> >
> > The inode/path logging in audit is ... something.  I have a
> > longstanding todo item to go revisit the audit inode logging, both to
> > fix some known bugs, and see what we can improve (I'm guessing quite a
> > bit).  Unfortunately, there is always something else which is burning
> > a little bit hotter and I haven't been able to get to it yet.
> >
>
> It is "something" alright. The audit logging just happens at strange
> and inconvenient times vs. what else we're trying to do wrt pathwalking
> and such. In particular here, the fact __audit_inode can block is what
> really sucks.
>
> Since we're discussing it...
>
> ISTM that the inode/path logging here is something like a tracepoint.
> In particular, we're looking to record a specific set of information at
> specific points in the code. One of the big differences between them
> however is that tracepoints don't block.  The catch is that we can't
> just drop messages if we run out of audit logging space, so that would
> have to be handled reasonably.

Yes, the buffer allocation is the tricky bit.  Audit does preallocate
some structs for tracking names which ideally should handle the vast
majority of the cases, but yes, we need something to handle all of the
corner cases too without having to resort to audit_panic().

> I wonder if we could leverage the tracepoint infrastructure to help us
> record the necessary info somehow? Copy the records into a specific
> ring buffer, and then copy them out to the audit infrastructure in
> task_work?

I believe using task_work will cause a number of challenges for the
audit subsystem as we try to bring everything together into a single
audit event.  We've had a lot of problems with io_uring doing similar
things, some of which are still unresolved.

> I don't have any concrete ideas here, but the path/inode audit code has
> been a burden for a while now and it'd be good to think about how we
> could do this better.

I've got some grand ideas on how to cut down on a lot of our
allocations and string generation in the critical path, not just with
the inodes, but with audit records in general.  Sadly I just haven't
had the time to get to any of it.

> > The general idea with audit is that you want to record the information
> > both on success and failure.  It's easy to understand the success
> > case, as it is a record of what actually happened on the system, but
> > you also want to record the failure case as it can provide some
> > insight on what a process/user is attempting to do, and that can be
> > very important for certain classes of users.  I haven't dug into the
> > patches in Christian's tree, but in general I think Jeff's guidance
> > about not changing what is recorded in the audit log is probably good
> > advice (there will surely be exceptions to that, but it's still good
> > guidance).
> >
>
> In this particular case, the question is:
>
> Do we need to emit a AUDIT_INODE_PARENT record when opening an existing
> file, just because O_CREAT was set? We don't emit such a record when
> opening without O_CREAT set.

I'm not as current on the third-party security requirements as I used
to be, but I do know that oftentimes when a file is created the parent
directory is an important bit of information to have in the audit log.

-- 
paul-moore.com





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux