> On Mar 24, 2022, at 11:52 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Fri, 2022-03-25 at 00:35 +0000, Chuck Lever III wrote: >> >> >>> On Mar 24, 2022, at 6:51 PM, Trond Myklebust >>> <trondmy@xxxxxxxxxxxxxxx> wrote: >>> >>> On Thu, 2022-03-24 at 18:08 -0400, Chuck Lever wrote: >>>> There have been reports of races that cause NFSv4 OPEN(CREATE) to >>>> return an error even though the requested file was created. NFSv4 >>>> does not seem to provide a status code for this case. >>>> >>>> There are plenty of things that can go wrong between the >>>> vfs_create() call in do_nfsd_create() and nfs4_vfs_get_file(), >>>> but >>>> one of them is another client looking up and modifying the file's >>>> mode bits in that window. If that happens, the creator might lose >>>> access to the file before it can be opened. >>>> >>>> Instead of opening the newly created file in >>>> nfsd4_process_open2(), >>>> open it as soon as the file is created, and leave it sitting in >>>> the >>>> file cache. >>>> >>>> This patch is not optimal, of course - we should replace the >>>> vfs_create() call in do_nfsd_create() with a call that creates >>>> the >>>> file and returns a "struct file *" that can be planted >>>> immediately >>>> in nf->nf_file. >>>> >>>> But first, I would like to hear opinions about the approach >>>> suggested above. >>>> >>>> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382 >>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>> --- >>>> fs/nfsd/vfs.c | 9 +++++++++ >>>> 1 file changed, 9 insertions(+) >>>> >>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >>>> index 02a544645b52..80b568fa12f1 100644 >>>> --- a/fs/nfsd/vfs.c >>>> +++ b/fs/nfsd/vfs.c >>>> @@ -1410,6 +1410,7 @@ do_nfsd_create(struct svc_rqst *rqstp, >>>> struct >>>> svc_fh *fhp, >>>> __be32 err; >>>> int host_err; >>>> __u32 v_mtime=0, v_atime=0; >>>> + struct nfsd_file *nf; >>>> >>>> /* XXX: Shouldn't this be nfserr_inval? */ >>>> err = nfserr_perm; >>>> @@ -1535,6 +1536,14 @@ do_nfsd_create(struct svc_rqst *rqstp, >>>> struct >>>> svc_fh *fhp, >>>> iap->ia_atime.tv_nsec = 0; >>>> } >>>> >>>> + /* Attempt to instantiate a filecache entry while we >>>> still >>>> hold >>>> + * the parent's inode mutex. */ >>>> + err = nfsd_file_acquire(rqstp, resfhp, NFSD_MAY_WRITE, >>>> &nf); >>>> + if (err) >>>> + /* This would be bad */ >>>> + goto out; >>>> + nfsd_file_put(nf); >>> >>> Why rely on the file cache to carry the nfsd_file? Isn't it much >>> easier >>> just to pass it back directly to the caller? >> >> My thought was that the "struct file *" has to end up in the >> filecache anyway, even in the NFSv3 case. If I do this right, >> I can avoid the subsequent call to nfsd_open_verified(), which >> seems to be the source of the racy chmod behavior I mentioned >> above. That might help NFSv3 too, which would need to recreate >> the "struct file *" in nfsd_read() or nfsd_write() anyway. >> >>> There are only 2 callers of do_nfsd_create(), so you'd have >>> nfsd3_proc_create() that will just call nfsd_file_put() as per >>> above, >>> whereas the NFSv4 specific do_open_lookup() could just stash it in >>> the >>> struct nfsd4_open so that it can get passed into >>> do_open_permission() >>> and eventually nfsd4_process_open2(). >> >> Neither nfsd4_process_open2() or do_open_permission() seem >> directly interested in the nfsd_file --- it's all under the >> covers, in nfs4_get_vfs_file(). But yes, a "struct nfsd4_open" >> is passed to and visible in nfs4_get_vfs_file(), as is the >> open->op_created boolean. > > Having a nfsd_file in the file cache doesn't prevent anyone from > changing the mode, owner and share access locks on the file before you > can do the next set of permissions checks after dropping locks, It's not meant to prevent those changes, but merely to ensure that the open "struct file *" that was initially created is the one used eventually in nfsd4_process_open2(). We don't want to have to open one there -- it could be too late. > nor > does it even guarantee that you will still have an open descriptor when > the call to nfsd4_process_open2() occurs. Agreed, that's the rub. If there's no way to get the filecache to always preserve that "struct file *", then explicitly returning it from do_nfsd_create() seems like the only choice. So now the problem is which VFS API should do_nfsd_create() use instead of vfs_create() ? I haven't yet found a "create a file" API that is EXPORTed and takes a dentry and returns a "struct file *". At this point I'm not sure one currently exists. -- Chuck Lever