Re: [PATCH v2] fs: try an opportunistic lookup for O_CREAT opens too

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2024-08-08 at 21:22 -0400, Paul Moore wrote:
> On Thu, Aug 8, 2024 at 8:33 PM Jeff Layton <jlayton@xxxxxxxxxx>
> wrote:
> > On Thu, 2024-08-08 at 20:28 -0400, Paul Moore wrote:
> > > On Thu, Aug 8, 2024 at 7:43 PM Jeff Layton <jlayton@xxxxxxxxxx>
> > > wrote:
> > > > On Thu, 2024-08-08 at 17:12 -0400, Paul Moore wrote:
> > > > > On Thu, Aug 8, 2024 at 1:11 PM Jan Kara <jack@xxxxxxx> wrote:
> > > > > > On Thu 08-08-24 12:36:07, Christian Brauner wrote:
> > > > > > > On Wed, Aug 07, 2024 at 10:36:58AM GMT, Jeff Layton
> > > > > > > wrote:
> > > > > > > > On Wed, 2024-08-07 at 16:26 +0200, Christian Brauner
> > > > > > > > wrote:
> > > > > > > > > > +static struct dentry *lookup_fast_for_open(struct
> > > > > > > > > > nameidata *nd, int open_flag)
> > > > > > > > > > +{
> > > > > > > > > > +       struct dentry *dentry;
> > > > > > > > > > +
> > > > > > > > > > +       if (open_flag & O_CREAT) {
> > > > > > > > > > +               /* Don't bother on an O_EXCL create
> > > > > > > > > > */
> > > > > > > > > > +               if (open_flag & O_EXCL)
> > > > > > > > > > +                       return NULL;
> > > > > > > > > > +
> > > > > > > > > > +               /*
> > > > > > > > > > +                * FIXME: If auditing is enabled,
> > > > > > > > > > then we'll have to unlazy to
> > > > > > > > > > +                * use the dentry. For now, don't
> > > > > > > > > > do this, since it shifts
> > > > > > > > > > +                * contention from parent's i_rwsem
> > > > > > > > > > to its d_lockref spinlock.
> > > > > > > > > > +                * Reconsider this once dentry
> > > > > > > > > > refcounting handles heavy
> > > > > > > > > > +                * contention better.
> > > > > > > > > > +                */
> > > > > > > > > > +               if ((nd->flags & LOOKUP_RCU) &&
> > > > > > > > > > !audit_dummy_context())
> > > > > > > > > > +                       return NULL;
> > > > > > > > > 
> > > > > > > > > Hm, the audit_inode() on the parent is done
> > > > > > > > > independent of whether the
> > > > > > > > > file was actually created or not. But the
> > > > > > > > > audit_inode() on the file
> > > > > > > > > itself is only done when it was actually created.
> > > > > > > > > Imho, there's no need
> > > > > > > > > to do audit_inode() on the parent when we immediately
> > > > > > > > > find that file
> > > > > > > > > already existed. If we accept that then this makes
> > > > > > > > > the change a lot
> > > > > > > > > simpler.
> > > > > > > > > 
> > > > > > > > > The inconsistency would partially remain though. When
> > > > > > > > > the file doesn't
> > > > > > > > > exist audit_inode() on the parent is called but by
> > > > > > > > > the time we've
> > > > > > > > > grabbed the inode lock someone else might already
> > > > > > > > > have created the file
> > > > > > > > > and then again we wouldn't audit_inode() on the file
> > > > > > > > > but we would have
> > > > > > > > > on the parent.
> > > > > > > > > 
> > > > > > > > > I think that's fine. But if that's bothersome the
> > > > > > > > > more aggressive thing
> > > > > > > > > to do would be to pull that audit_inode() on the
> > > > > > > > > parent further down
> > > > > > > > > after we created the file. Imho, that should be
> > > > > > > > > fine?...
> > > > > > > > > 
> > > > > > > > > See
> > > > > > > > > https://gitlab.com/brauner/linux/-/commits/vfs.misc.jeff/?ref_type=heads
> > > > > > > > > for a completely untested draft of what I mean.
> > > > > > > > 
> > > > > > > > Yeah, that's a lot simpler. That said, my experience
> > > > > > > > when I've worked
> > > > > > > > with audit in the past is that people who are using it
> > > > > > > > are _very_
> > > > > > > > sensitive to changes of when records get emitted or
> > > > > > > > not. I don't like
> > > > > > > > this, because I think the rules here are ad-hoc and
> > > > > > > > somewhat arbitrary,
> > > > > > > > but keeping everything working exactly the same has
> > > > > > > > been my MO whenever
> > > > > > > > I have to work in there.
> > > > > > > > 
> > > > > > > > If a certain access pattern suddenly generates a
> > > > > > > > different set of
> > > > > > > > records (or some are missing, as would be in this
> > > > > > > > case), we might get
> > > > > > > > bug reports about this. I'm ok with simplifying this
> > > > > > > > code in the way
> > > > > > > > you suggest, but we may want to do it in a patch on top
> > > > > > > > of mine, to
> > > > > > > > make it simple to revert later if that becomes
> > > > > > > > necessary.
> > > > > > > 
> > > > > > > Fwiw, even with the rearranged checks in v3 of the patch
> > > > > > > audit records
> > > > > > > will be dropped because we may find a positive dentry but
> > > > > > > the path may
> > > > > > > have trailing slashes. At that point we just return
> > > > > > > without audit
> > > > > > > whereas before we always would've done that audit.
> > > > > > > 
> > > > > > > Honestly, we should move that audit event as right now
> > > > > > > it's just really
> > > > > > > weird and see if that works. Otherwise the change is
> > > > > > > somewhat horrible
> > > > > > > complicating the already convoluted logic even more.
> > > > > > > 
> > > > > > > So I'm appending the patches that I have on top of your
> > > > > > > patch in
> > > > > > > vfs.misc. Can you (other as well ofc) take a look and
> > > > > > > tell me whether
> > > > > > > that's not breaking anything completely other than later
> > > > > > > audit events?
> > > > > > 
> > > > > > The changes look good as far as I'm concerned but let me CC
> > > > > > audit guys if
> > > > > > they have some thoughts regarding the change in generating
> > > > > > audit event for
> > > > > > the parent. Paul, does it matter if open(O_CREAT) doesn't
> > > > > > generate audit
> > > > > > event for the parent when we are failing open due to
> > > > > > trailing slashes in
> > > > > > the pathname? Essentially we are speaking about moving:
> > > > > > 
> > > > > >         audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
> > > > > > 
> > > > > > from open_last_lookups() into lookup_open().
> > > > > 
> > > > > Thanks for adding the audit mailing list to the CC, Jan.  I
> > > > > would ask
> > > > > for others to do the same when discussing changes that could
> > > > > impact
> > > > > audit (similar requests for the LSM framework, SELinux,
> > > > > etc.).
> > > > > 
> > > > > The inode/path logging in audit is ... something.  I have a
> > > > > longstanding todo item to go revisit the audit inode logging,
> > > > > both to
> > > > > fix some known bugs, and see what we can improve (I'm
> > > > > guessing quite a
> > > > > bit).  Unfortunately, there is always something else which is
> > > > > burning
> > > > > a little bit hotter and I haven't been able to get to it yet.
> > > > > 
> > > > 
> > > > It is "something" alright. The audit logging just happens at
> > > > strange
> > > > and inconvenient times vs. what else we're trying to do wrt
> > > > pathwalking
> > > > and such. In particular here, the fact __audit_inode can block
> > > > is what
> > > > really sucks.
> > > > 
> > > > Since we're discussing it...
> > > > 
> > > > ISTM that the inode/path logging here is something like a
> > > > tracepoint.
> > > > In particular, we're looking to record a specific set of
> > > > information at
> > > > specific points in the code. One of the big differences between
> > > > them
> > > > however is that tracepoints don't block.  The catch is that we
> > > > can't
> > > > just drop messages if we run out of audit logging space, so
> > > > that would
> > > > have to be handled reasonably.
> > > 
> > > Yes, the buffer allocation is the tricky bit.  Audit does
> > > preallocate
> > > some structs for tracking names which ideally should handle the
> > > vast
> > > majority of the cases, but yes, we need something to handle all
> > > of the
> > > corner cases too without having to resort to audit_panic().
> > > 
> > > > I wonder if we could leverage the tracepoint infrastructure to
> > > > help us
> > > > record the necessary info somehow? Copy the records into a
> > > > specific
> > > > ring buffer, and then copy them out to the audit infrastructure
> > > > in
> > > > task_work?
> > > 
> > > I believe using task_work will cause a number of challenges for
> > > the
> > > audit subsystem as we try to bring everything together into a
> > > single
> > > audit event.  We've had a lot of problems with io_uring doing
> > > similar
> > > things, some of which are still unresolved.
> > > 
> > > > I don't have any concrete ideas here, but the path/inode audit
> > > > code has
> > > > been a burden for a while now and it'd be good to think about
> > > > how we
> > > > could do this better.
> > > 
> > > I've got some grand ideas on how to cut down on a lot of our
> > > allocations and string generation in the critical path, not just
> > > with
> > > the inodes, but with audit records in general.  Sadly I just
> > > haven't
> > > had the time to get to any of it.
> > > 
> > > > > The general idea with audit is that you want to record the
> > > > > information
> > > > > both on success and failure.  It's easy to understand the
> > > > > success
> > > > > case, as it is a record of what actually happened on the
> > > > > system, but
> > > > > you also want to record the failure case as it can provide
> > > > > some
> > > > > insight on what a process/user is attempting to do, and that
> > > > > can be
> > > > > very important for certain classes of users.  I haven't dug
> > > > > into the
> > > > > patches in Christian's tree, but in general I think Jeff's
> > > > > guidance
> > > > > about not changing what is recorded in the audit log is
> > > > > probably good
> > > > > advice (there will surely be exceptions to that, but it's
> > > > > still good
> > > > > guidance).
> > > > > 
> > > > 
> > > > In this particular case, the question is:
> > > > 
> > > > Do we need to emit a AUDIT_INODE_PARENT record when opening an
> > > > existing
> > > > file, just because O_CREAT was set? We don't emit such a record
> > > > when
> > > > opening without O_CREAT set.
> > > 
> > > I'm not as current on the third-party security requirements as I
> > > used
> > > to be, but I do know that oftentimes when a file is created the
> > > parent
> > > directory is an important bit of information to have in the audit
> > > log.
> > > 
> > 
> > Right. We'd still have that here since we have to unlazy to
> > actually
> > create the file.
> > 
> > The question here is about the case where O_CREAT is set, but the
> > file
> > already exists. Nothing is being created in that case, so do we
> > need to
> > emit an audit record for the parent?
> 
> As long as the full path information is present in the existing
> file's
> audit record it should be okay.
> 

O_CREAT is ignored when the dentry already exists, so doing the same
thing that we do when O_CREAT isn't set seems reasonable.

We do call this in do_open, which would apply in this case:

        if (!(file->f_mode & FMODE_CREATED))
                audit_inode(nd->name, nd->path.dentry, 0);

That should have the necessary path info. If that's the case, then I
think Christian's cleanup series on top of mine should be OK. I think
that the only thing that would be missing is the AUDIT_INODE_PARENT
record for the directory in the case where the dentry already exists,
which should be superfluous.

ISTR that Red Hat has a pretty extensive testsuite for audit. We might
want to get them to run their tests on Christian's changes to be sure
there are no surprises, if they are amenable.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux