On Thu, 2024-08-08 at 20:28 -0400, Paul Moore wrote: > On Thu, Aug 8, 2024 at 7:43 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Thu, 2024-08-08 at 17:12 -0400, Paul Moore wrote: > > > On Thu, Aug 8, 2024 at 1:11 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Thu 08-08-24 12:36:07, Christian Brauner wrote: > > > > > On Wed, Aug 07, 2024 at 10:36:58AM GMT, Jeff Layton wrote: > > > > > > On Wed, 2024-08-07 at 16:26 +0200, Christian Brauner wrote: > > > > > > > > +static struct dentry *lookup_fast_for_open(struct nameidata *nd, int open_flag) > > > > > > > > +{ > > > > > > > > + struct dentry *dentry; > > > > > > > > + > > > > > > > > + if (open_flag & O_CREAT) { > > > > > > > > + /* Don't bother on an O_EXCL create */ > > > > > > > > + if (open_flag & O_EXCL) > > > > > > > > + return NULL; > > > > > > > > + > > > > > > > > + /* > > > > > > > > + * FIXME: If auditing is enabled, then we'll have to unlazy to > > > > > > > > + * use the dentry. For now, don't do this, since it shifts > > > > > > > > + * contention from parent's i_rwsem to its d_lockref spinlock. > > > > > > > > + * Reconsider this once dentry refcounting handles heavy > > > > > > > > + * contention better. > > > > > > > > + */ > > > > > > > > + if ((nd->flags & LOOKUP_RCU) && !audit_dummy_context()) > > > > > > > > + return NULL; > > > > > > > > > > > > > > Hm, the audit_inode() on the parent is done independent of whether the > > > > > > > file was actually created or not. But the audit_inode() on the file > > > > > > > itself is only done when it was actually created. Imho, there's no need > > > > > > > to do audit_inode() on the parent when we immediately find that file > > > > > > > already existed. If we accept that then this makes the change a lot > > > > > > > simpler. > > > > > > > > > > > > > > The inconsistency would partially remain though. When the file doesn't > > > > > > > exist audit_inode() on the parent is called but by the time we've > > > > > > > grabbed the inode lock someone else might already have created the file > > > > > > > and then again we wouldn't audit_inode() on the file but we would have > > > > > > > on the parent. > > > > > > > > > > > > > > I think that's fine. But if that's bothersome the more aggressive thing > > > > > > > to do would be to pull that audit_inode() on the parent further down > > > > > > > after we created the file. Imho, that should be fine?... > > > > > > > > > > > > > > See https://gitlab.com/brauner/linux/-/commits/vfs.misc.jeff/?ref_type=heads > > > > > > > for a completely untested draft of what I mean. > > > > > > > > > > > > Yeah, that's a lot simpler. That said, my experience when I've worked > > > > > > with audit in the past is that people who are using it are _very_ > > > > > > sensitive to changes of when records get emitted or not. I don't like > > > > > > this, because I think the rules here are ad-hoc and somewhat arbitrary, > > > > > > but keeping everything working exactly the same has been my MO whenever > > > > > > I have to work in there. > > > > > > > > > > > > If a certain access pattern suddenly generates a different set of > > > > > > records (or some are missing, as would be in this case), we might get > > > > > > bug reports about this. I'm ok with simplifying this code in the way > > > > > > you suggest, but we may want to do it in a patch on top of mine, to > > > > > > make it simple to revert later if that becomes necessary. > > > > > > > > > > Fwiw, even with the rearranged checks in v3 of the patch audit records > > > > > will be dropped because we may find a positive dentry but the path may > > > > > have trailing slashes. At that point we just return without audit > > > > > whereas before we always would've done that audit. > > > > > > > > > > Honestly, we should move that audit event as right now it's just really > > > > > weird and see if that works. Otherwise the change is somewhat horrible > > > > > complicating the already convoluted logic even more. > > > > > > > > > > So I'm appending the patches that I have on top of your patch in > > > > > vfs.misc. Can you (other as well ofc) take a look and tell me whether > > > > > that's not breaking anything completely other than later audit events? > > > > > > > > The changes look good as far as I'm concerned but let me CC audit guys if > > > > they have some thoughts regarding the change in generating audit event for > > > > the parent. Paul, does it matter if open(O_CREAT) doesn't generate audit > > > > event for the parent when we are failing open due to trailing slashes in > > > > the pathname? Essentially we are speaking about moving: > > > > > > > > audit_inode(nd->name, dir, AUDIT_INODE_PARENT); > > > > > > > > from open_last_lookups() into lookup_open(). > > > > > > Thanks for adding the audit mailing list to the CC, Jan. I would ask > > > for others to do the same when discussing changes that could impact > > > audit (similar requests for the LSM framework, SELinux, etc.). > > > > > > The inode/path logging in audit is ... something. I have a > > > longstanding todo item to go revisit the audit inode logging, both to > > > fix some known bugs, and see what we can improve (I'm guessing quite a > > > bit). Unfortunately, there is always something else which is burning > > > a little bit hotter and I haven't been able to get to it yet. > > > > > > > It is "something" alright. The audit logging just happens at strange > > and inconvenient times vs. what else we're trying to do wrt pathwalking > > and such. In particular here, the fact __audit_inode can block is what > > really sucks. > > > > Since we're discussing it... > > > > ISTM that the inode/path logging here is something like a tracepoint. > > In particular, we're looking to record a specific set of information at > > specific points in the code. One of the big differences between them > > however is that tracepoints don't block. The catch is that we can't > > just drop messages if we run out of audit logging space, so that would > > have to be handled reasonably. > > Yes, the buffer allocation is the tricky bit. Audit does preallocate > some structs for tracking names which ideally should handle the vast > majority of the cases, but yes, we need something to handle all of the > corner cases too without having to resort to audit_panic(). > > > I wonder if we could leverage the tracepoint infrastructure to help us > > record the necessary info somehow? Copy the records into a specific > > ring buffer, and then copy them out to the audit infrastructure in > > task_work? > > I believe using task_work will cause a number of challenges for the > audit subsystem as we try to bring everything together into a single > audit event. We've had a lot of problems with io_uring doing similar > things, some of which are still unresolved. > > > I don't have any concrete ideas here, but the path/inode audit code has > > been a burden for a while now and it'd be good to think about how we > > could do this better. > > I've got some grand ideas on how to cut down on a lot of our > allocations and string generation in the critical path, not just with > the inodes, but with audit records in general. Sadly I just haven't > had the time to get to any of it. > > > > The general idea with audit is that you want to record the information > > > both on success and failure. It's easy to understand the success > > > case, as it is a record of what actually happened on the system, but > > > you also want to record the failure case as it can provide some > > > insight on what a process/user is attempting to do, and that can be > > > very important for certain classes of users. I haven't dug into the > > > patches in Christian's tree, but in general I think Jeff's guidance > > > about not changing what is recorded in the audit log is probably good > > > advice (there will surely be exceptions to that, but it's still good > > > guidance). > > > > > > > In this particular case, the question is: > > > > Do we need to emit a AUDIT_INODE_PARENT record when opening an existing > > file, just because O_CREAT was set? We don't emit such a record when > > opening without O_CREAT set. > > I'm not as current on the third-party security requirements as I used > to be, but I do know that oftentimes when a file is created the parent > directory is an important bit of information to have in the audit log. > Right. We'd still have that here since we have to unlazy to actually create the file. The question here is about the case where O_CREAT is set, but the file already exists. Nothing is being created in that case, so do we need to emit an audit record for the parent? -- Jeff Layton <jlayton@xxxxxxxxxx>