RE: [PATCH] ceph: is_root_ceph_dentry() cleanup

Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> · Tue, 11 Feb 2025 00:08:21 +0000

On Wed, 2025-01-29 at 01:12 +0000, Al Viro wrote:
> On Tue, Jan 28, 2025 at 11:27:05PM +0000, Viacheslav Dubeyko wrote:
> 
> > I assume that you imply this code:
> > 
> > 	/* can we conclude ENOENT locally? */
> > 	if (d_really_is_negative(dentry)) {
> > 		struct ceph_inode_info *ci = ceph_inode(dir);
> > 		struct ceph_dentry_info *di = ceph_dentry(dentry);
> > 
> > 		spin_lock(&ci->i_ceph_lock);
> > 		doutc(cl, " dir %llx.%llx flags are 0x%lx\n",
> > 		      ceph_vinop(dir), ci->i_ceph_flags);
> > 		if (strncmp(dentry->d_name.name,
> > 			    fsc->mount_options->snapdir_name,
> > 			    dentry->d_name.len) &&
> > 		    !is_root_ceph_dentry(dir, dentry) &&
> > 		    ceph_test_mount_opt(fsc, DCACHE) &&
> > 		    __ceph_dir_is_complete(ci) &&
> > 		    __ceph_caps_issued_mask_metric(ci, CEPH_CAP_FILE_SHARED,
> > 1)) {
> > 			__ceph_touch_fmode(ci, mdsc, CEPH_FILE_MODE_RD);
> > 			spin_unlock(&ci->i_ceph_lock);
> > 			doutc(cl, " dir %llx.%llx complete, -ENOENT\n",
> > 			      ceph_vinop(dir));
> > 			d_add(dentry, NULL);
> > 			di->lease_shared_gen = atomic_read(&ci->i_shared_gen);
> > 			return NULL;
> > 		}
> > 		spin_unlock(&ci->i_ceph_lock);
> > 	}
> > 
> > Am I correct? So, how can we rework this code if it's wrong? What is your
> > vision? Do you mean that it's dead code? How can we check it?
> 
> I mean that ->lookup() is called *ONLY* for a negative unhashed dentries.
> In other words, on a call from VFS that condition will always be true.
> That part is easily provable; what is harder to reason about is the
> direct call of ceph_lookup() from ceph_handle_notrace_create().
> 
> The callers of that thing (ceph_mknod(), ceph_symlink() and ceph_mkdir())
> are all guaranteed that dentry will be negative when they are called.
> The hard-to-reason-about part is the call of ceph_mdsc_do_request()
> directly preceding the calls of ceph_handle_notrace_create().
> 
> Can ceph_mdsc_do_request() return 0, with req->r_reply_info.head->is_dentry
> being false *AND* a call of splice_dentry() made by ceph_fill_trace() called
> by ceph_mdsc_do_request()?
> 
> AFAICS, there are 3 calls of splice_dentry(); two of them happen under
> explicit check for ->is_dentry and thus are not interesting for our
> purposes.  The third one, though, could be hit with ->is_dentry being
> false and ->r_op being CEPH_MDS_OP_MKSNAP.  That is not impossible
> from ceph_mkdir(), as far as I can tell, and I don't understand the
> details well enough to tell whether it can actually happen.
> 
> Is it actually possible to hit ceph_handle_notrace_create() with
> a positive dentry?

Sorry for delay, I was busy with other issues.

I did run the xfstests and, as far as I can see, I had only negative dentries as
an input of ceph_lookup(). However, I did testing with presence of only one
CephFS kernel client and I didn't use snapshots. I guess, potentially, if we
have several CephFS kernel clients, for example, on different nodes working with
the same folder, then, maybe, MDS will behave in slightly different way. But
it's theory that I need to check.

In general, the MDS can issue NULL dentries to clients so that  they "know" the
direntry does not exist without having some capability (or lease) associated
with it. As far as I can see, if application repeatedly does stat of file, then
the kernel driver isn't repeatedly going out to the MDS to lookup that file. So,
I assume that this is the goal of this check and logic.

But, frankly speaking, I am not completely digested this piece of code and all
possible use-cases yet. :)

Thanks,
Slava.