On Wed, Oct 27, 2021 at 11:24 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > On Wed, Oct 27, 2021 at 08:59:15AM +0300, Amir Goldstein wrote: > > On Tue, Oct 26, 2021 at 10:14 PM Ioannis Angelakopoulos > > <iangelak@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Tue, Oct 26, 2021 at 2:27 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > >> > > >> On Tue, Oct 26, 2021 at 08:59:44PM +0300, Amir Goldstein wrote: > > >> > On Tue, Oct 26, 2021 at 7:18 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > >> > > > > >> > > On Tue, Oct 26, 2021 at 06:23:50PM +0300, Amir Goldstein wrote: > > >> > > > > >> > > [..] > > >> > > > > 3) The lifetime of the local watch in the guest kernel is very > > >> > > > > important. Specifically, there is a possibility that the guest does not > > >> > > > > receive remote events on time, if it removes its local watch on the > > >> > > > > target or deletes the inode (and thus the guest kernel removes the watch). > > >> > > > > In these cases the guest kernel removes the local watch before the > > >> > > > > remote events arrive from the host (virtiofsd) and as such the guest > > >> > > > > kernel drops all the remote events for the target inode (since the > > >> > > > > corresponding local watch does not exist anymore). > > >> > > > > >> > > So this is one of the issues which has been haunting us in virtiofs. If > > >> > > a file is removed, for local events, event is generated first and > > >> > > then watch is removed. But in case of remote filesystems, it is racy. > > >> > > It is possible that by the time event arrives, watch is already gone > > >> > > and application never sees the delete event. > > >> > > > > >> > > Not sure how to address this issue. > > >> > > > >> > > >> > Can you take me through the scenario step by step. > > >> > I am not sure I understand the exact sequence of the race. > > >> > > >> Ioannis, please correct me If I get something wrong. You know exact > > >> details much more than me. > > >> > > >> A. Say a guest process unlinks a file. > > >> B. Fuse sends an unlink request to server (virtiofsd) > > >> C. File is unlinked on host. Assume there are no other users so inode > > >> will be freed as well. And event will be generated on host and watch > > >> removed. > > >> D. Now Fuse server will send a unlink request reply. unlink notification > > >> might still be in kernel buffers or still be in virtiofsd or could > > >> be in virtiofs virtqueue. > > >> E. Fuse client will receive unlink reply and remove local watch. > > >> > > >> Fuse reply and notification event are now traveling in parallel on > > >> different virtqueues and there is no connection between these two. And > > >> it could very well happen that fuse reply comes first, gets processed > > >> first and local watch is removed. And notification is processed right > > >> after but by then local watch is gone and filesystem will be forced to > > >> drop event. > > >> > > >> As of now situation is more complicated in virtiofsd. We don't keep > > >> file handle open for file and keep an O_PATH fd open for each file. > > >> That means in step D above, inode on host is not freed yet and unlink > > >> event is not generated yet. When unlink reply reaches fuse client, > > >> it sends FORGET messages to server, and then server closes O_PATH fd > > >> and then host generates unlink events. By that time its too late, > > >> guest has already remove local watches (and triggered removal of > > >> remote watches too). > > >> > > >> This second problem probably can be solved by using file handles, but > > >> basic race will still continue to be there. > > >> > > >> > If it is local file removal that causes watch to be removed, > > >> > then don't drop local events and you are good to go. > > >> > Is it something else? > > >> > > >> - If remote events are enabled, then idea will be that user space gets > > >> and event when file is actually removed from server, right? Now it > > >> is possible that another VM has this file open and file has not been > > >> yet removed. So local event only tells you that file has been removed > > >> in guest VM (or locally) but does not tell anything about the state > > >> of file on server. (It has been unlinked on server but inode continues > > >> to be alive internall). > > >> > > >> - If user receives both local and remote delete event, it will be > > >> confusing. I guess if we want to see both the events, then there > > >> has to be some sort of info in event which classifies whether event > > >> is local or remote. And let application act accordingly. > > >> > > >> Thanks > > >> Vivek > > >> > > > > > > Hello Amir! > > > > > > Sorry for taking part in the conversation a bit late. Vivek was on point with the > > > example he gave but the race is a bit more generic than only the DELETE event. > > > > > > Let's say that a guest process monitors an inode for OPEN events: > > > > > > 1) The same guest process or another guest process opens the file (related to the > > > monitored inode), and then closes and immediately deletes the file/inode. > > > 2) The FUSE server (virtiofsd) will mimic the operations of the guest process: > > > a) Will open the file on the host side and thus a remote OPEN event is going to > > > be generated on the host and sent to the guest. > > > b) Will unlink the remote inode and if no other host process uses the inode then the > > > inode will be freed and a DELETE event is going to be generated on the host and sent > > > to the guest (However, due to how virtiofsd works and Vivek mentioned, this step won't > > > happen immediately) > > > > > > > You are confusing DELETE with DELETE_SELF. > > DELETE corresponds to unlink(), so you get a DELETE event even if > > inode is a hardlink > > with nlink > 0 after unlink(). > > > > The DELETE event is reported (along with filename) against the parent directory > > inode, so the test case above won't drop the event. > > Hi Amir, > > Agreed that there is confusion between DELETE and DELETE_SELF events. I > think Ioannis is referring to DELETE_SELF event. With this example he > is trying to emphasize that due to races, problem is not limited to > DELETE_SELF events only and other events could arrive little later > after the local watch in guest has been removed and then all those > events will be dropped as well. So he gave example of OPEN event. And > I think remote IN_IGNORED might face the same fate. > > In the case of IN_IGNORED, I am wondering is it ok to generate that > event locally instead. > Yes, that is what I meant. Local watch removed can be used as a hint to applications that events might have been lost. > > > > > The problem here is that the OPEN event might still be travelling towards the guest in the > > > virtqueues and arrives after the guest has already deleted its local inode. > > > While the remote event (OPEN) received by the guest is valid, its fsnotify > > > subsystem will drop it since the local inode is not there. > > > > > > > I have a feeling that we are mixing issues related to shared server > > and remote fsnotify. > > Does virtiofsd support multiple guests already? > > I would like to think that there are not many basic issues with shared > directory configuration. So we don't stop users from using it. > > > There are many other > > issues related > > to cache coherency that should be dealt with in this case, some of > > them overlap the > > problem that you describe, so solving the narrow problem of dropped > > remote events > > seems like the wrong way to approach the problem. > > Dropped remote event problem/race will exist even if it was not shared > server. Remote events travel through different virtqueue as comapred > to fuse request reply. So there is no guarantee in what order events > or replies will processed. > > > > > I think that in a shared server situation, the simple LOOKUP/FORGET protocol > > will not suffice. > > Hmm.., So what's the problem with LOOKUP/FORGET in shared dir case? > I guess when we say "shared dir" case, we mean different things. I was thinking about "shared between different client" and about the fact the information about the server refcount of objects is obscured from the client. For example, LOOKUP_INC/DEC requests that report back the server's shared object refcount could prevent a guest inode from being evicted in case inode has a remote watch. But you were mentioning "shared dir with guest and host", so LOOKUP refcount is not relevant in that case. > > I will not even try to solve the generic problem, > > but will just > > mentioned that SMB/NFS protocols use delegations/oplocks in the protocol > > to advertise object usage by other clients to all clients. > > I think Miklos was looking into the idea of some sort of file leases > on fuse for creating equivalent of delegations. Not sure if that work > made any progress. > > > > > I think this issue is far outside the scope of your project and you should > > just leave the dropped events as a known limitation at this point. > > May be that's what we should do to begin with and just say these events > can be lost or never arrive. > > > inotify has the event IN_IGNORED that application can use as a hint that some > > events could have been dropped. > > That probably will require generating IN_IGNORED locally when local watch > goes away (and not rely on remote IN_IGNORED), IIUC. > Right. Thanks, Amir.