On Tue, Feb 11, 2025 at 07:32:16PM +0000, Viacheslav Dubeyko wrote: > > The really unpleasant question is whether ceph_handle_notrace_create() could > > end up feeding an already-positive dentry to direct call of ceph_lookup()... > > We have ceph_handle_notrace_create() call in several methods: > (1) ceph_mknod() > (2) ceph_symlink() > (3) ceph_mkdir() > (4) ceph_atomic_open() > > Every time we create object at first and, then, we call > ceph_handle_notrace_create() if creation was successful. We have such pattern: > > req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS); > > <skipped> > > err = ceph_mdsc_do_request(mdsc, dir, req); > if (!err && !req->r_reply_info.head->is_dentry) > err = ceph_handle_notrace_create(dir, dentry); > > And ceph_lookup() has such logic: > > if (d_really_is_negative(dentry)) { > <execute logic> > if (-ENOENT) > return NULL; > } > > req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS); > > <skipped> > > err = ceph_mdsc_do_request(mdsc, NULL, req); > > So, we have two different type of requests here: > (1) ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_MKNOD, USE_AUTH_MDS) > (2) ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS) > > The first request creates an object on MDS side and second one checks that this > object exists on MDS side by lookup. I assume that first request simply reports > success or failure of object creation. And only second one can extract metadata > from MDS side. If only... The first request may return that metadata, in which case we'll get splice_dentry() called by ceph_fill_trace() when reply arrives. Note that calls of ceph_handle_notrace_create() are conditional. Intent is as you've described and normally that's how it works, but that assumes sane reply. Which is not a good assumption to make, when it comes to memory corruption on client... I _think_ the conditions are sufficient. There are 4 callers of ceph_handle_notrace_create(). All of them require is_dentry in reply to be false; the one in ceph_mkdir() requires both is_dentry and is_target to be false. ceph_fill_trace() does not call splice_dentry() when both is_dentry and is_target are false, so we are only interested in the mknod/symlink/atomic_open callers. Past the handling of !is_dentry && !is_target case ceph_fill_trace() has a large if (is_dentry); not a concern for us. Next comes if (is_target) where we deal with ceph_fill_inode(); no calls of splice_dentry() in it. After that we have if (is_dentry && ....) { ... splice_dentry() call possible here ... } else if ((req->r_op == CEPH_MDS_OP_LOOKUPSNAP || req->r_op == CEPH_MDS_OP_MKSNAP) && test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) && !test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) { ... splice_dentry() call possible here ... } else if (is_dentry && ...) { ... } Since we only care about !is_dentry case, that leaves the middle part. LOOKUPSNAP can come from ceph_lookup() or ceph_d_revalidate(), neither of which call ceph_handle_notrace_create(). MKSNAP comes only from ceph_mkdir(), which _does_ call ceph_handle_notrace_create(), but only in case !is_dentry && !is_target, so that call of splice_dentry() is not going to happen there. *IF* the analysis above is correct, we can rely upon the ceph_lookup() always getting a negative dentry and that if (d_really_is_negative(dentry)) is always taken. But I could be missing something subtle in there ;-/