On Thu, May 12, 2022 at 01:46:12PM +0530, Dharmendra Hans wrote: > On Thu, May 12, 2022 at 1:00 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > > > On Wed, May 11, 2022 at 01:21:13PM -0400, Vivek Goyal wrote: > > > On Wed, May 11, 2022 at 11:40:59AM +0200, Miklos Szeredi wrote: > > > > On Thu, 5 May 2022 at 21:59, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > > > > > > > > Oh, I have no issues with the intent. I will like to see cut in network > > > > > traffic too (if we can do this without introducing problems). My primary > > > > > interest is that this kind of change should benefit virtiofs as well. > > > > > > > > One issue with that appears to be checking permissions. AFAIU this > > > > patchset only enables the optimization if default_permissions is > > > > turned off (i.e. all permission checking is done by the server). But > > > > virtiofs uses the default_permissions model. > > > > > > IIUC, only 3rd patch mentions that default_permission should be turned > > > off. IOW, first patch where lookup + create + open is a single operation > > > and second patch which does "lookup + open" in a single operation does > > > not seem to require that default_permissions are not in effect. > > > > > > So if first two patches work fine, I think virtiofs should benefit too. > > > (IMHO, 3rd patch is too hacky anyway) > > > > > > W.r.t permission checks, looks like may_open() will finally be called > > > after ->atomic_open(). So even if we open the file, we should still be > > > able to check whether we have permissions to open the file or not > > > after the fact. > > > > > > fs/namei.c > > > > > > path_openat() > > > { > > > open_last_lookups() <--- This calls ->atomic_open() > > > do_open() <--- This calls may_open() > > > } > > > > Actually I am not sure about it. I was playing with > > > > open(foo.txt, O_CREAT | O_RDWR, O_IRUSR) > > > > This succeeds if file is newly created but if file already existed, this > > fails with -EACCESS > > > > So man 2 open says following. Thanks to Andy Price for pointing me to it. > > > > Note that mode applies only to future accesses of the newly cre‐ > > ated file; the open() call that creates a read-only file may > > well return a read/write file descriptor. > > > > > > Now I am wondering how will it look like with first patch. Assume file > > already exists on the server (But there is no negative dentry present) > > and I do following. And assume file only has read permission for user > > and I am trying to open it read-write. > > > > open(foo.txt, O_CREAT | O_RDWR, O_IRUSR) > > > > In normal circumstances, user will expect -EACCESS as file is read-only > > and user is trying to open it read-write. > > > > I am wondering how will it look like with this first patch. > > > > Current fuse ->atomic_open() looks up the dentry and does not open > > the file if dentry is positive. > > > > New implementation will skip lookup and open the file anyway and > > set file->f_mode |= FMODE_CREATED; (First patch in series) > > > > So first of all this seems wrong. I thought FMODE_CREATED should be > > set only if file was newly created. Is that a correct understanding. > > You are right. we should mark in f_mode only if the file was actually created. > Thanks for catching this. Appreciate your efforts:) > > > > > And I am looking at do_open() code. It does bunch of things based > > on FMODE_CREATED flag. One of the things it does is reset acc_mode =0 > > > > if (file->f_mode & FMODE_CREATED) { > > /* Don't check for write permission, don't truncate */ > > open_flag &= ~O_TRUNC; > > acc_mode = 0; > > } > > error = may_open(mnt_userns, &nd->path, acc_mode, open_flag); > > > > I suspect this is the code which allows opening a newly created read-only > > file as O_RDWR. (Though I am not 100% sure). > > Correct. I could see it. > > > > > I suspect with first patch this will be broken. We will set FMODE_CREATED > > even if file already existed and VFS will assume a new file has been > > created and do bunch of things which is wrong. > > > > So looks like fuse ->atomic_open() should set FMODE_CREATED only if > > it really created the file. > > This looks like an obvious bug with new patches. But If I do not miss > anything then its a bug on distributed file systems with current code > as well. > It could happen this way(Taking your example which is perfect to trace > this on distributed systems): > Lets start with File is non-existent yet on the file system. There > comes the first client which does a lookup on the file, it does not > find the file. Meanwhile another client created the file on the File > system. Now when this first client goes to create the file, before > going down it sets FMODE_CREATED on the file handle and then goes down > the lower layers. It comes back with inode but f->mode as > FMODE_CREATED which is incorrect. This mode then results in acc_mode > being set to zero which then allows access to the file as O_RDWR. I think with current code (FUSE_CREATE), it is a race with shared filesystems where multiple clients might be sharing this directory. We are looking up dentry first and issue FUSE_CREATE only if dentry is negative. Now it is possible that between lookup() + FUSE_CREATE, another client dropped in same file in the directory. But one could argue, that's something not detectable by this client. So from user's perspective, this client created the file. If we have a negative dentry, then we will not do the lookup and assumption is that we created file (even if another client created it between previous lookup and this creation). So agreed, this is a race but might not be very harmful one because looks like we are providing weaker coherency with FUSE as opposed to local filesystems. In fact this case fuse server running on a directory which is shared by multiple clients bothers me a lot. There seem to be many corner cases where it is not clear what will happen if another client is doing operation. > > I think Miklos proposed to return the flag from user space if the file > was actually created, this would solve two problems 1) this access > problem and code execution going the wrong path 2) correct update if > parent dir changed or not. Agreed that we could use new command FUSE_ATOMIC_CREATE/FUSE_CREATE_EXT to figure out if file was newly created or not and then set FMODE_CREATED accordingly. W.r.t dir change, I am assuming we are invalidating dir attributes just because if we created file, it could change dir's mtime/ctime. So yes, FUSE_ATOMIC_CREATE/FUSE_CREATE_EXT could return this info whether file was actually crated or not and invalidate dir's attribute accordingly. Thanks Vivek