Hi Bruce, Thank you for your response. On Mon, Aug 26, 2019 at 4:39 PM J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Sun, Aug 25, 2019 at 01:12:34PM +0300, Alex Lyakas wrote: > > You are listed as maintainers of nfsd. Can you please take a look at > > the below patch? > > Thanks! > > I take it this was found by some kind of code analysis or fuzzing, not > use in production? We are hitting the following issue in production quite frequently: - We create two local file systems FS1 and FS2 on the server machine S - We export both FS1 and FS2 through nfsd to the same nfs client, running on client machine C - On C, we mount both exported file systems and start writing files to both of them - After few minutes, on server machine S, we un-export FS1 only. We don't unmount FS1 on the client machine C prior to un-exporting. Also, FS2 remains exported to C. - We want to unmount FS1 on the server machine S, but we fail, because there are still open files on FS1 by nfsd. Debugging this issue showed the following root cause: we have a nfs4_client entry for the client C. This entry has two nfs4_openowners, for FS1 and FS2, although FS1 was un-exported. Looking at the stateids of both openowners, we see that they have stateids of kind NFS4_OPEN_STID, and each stateid is holding a nfs4_file. The reason we cannot unmount FS1, is because we still have an openowner for FS1, holding open-stateids, which hold open files on FS1. The laundromat doesn't help in this case, because it can only decide per-nfs4_client that it should be purged. But in this case, since FS2 is still exported to C, there is no reason to purge the nfs4_client. This situation remains until we un-export FS2 as well. Then the whole nfs4_client is purged, and all the files get closed, and we can unmount both FS1 and FS2. We started looking around, and we found the failure injection code that can "forget openowners". We wrote some custom code that allows us to select the openowner which is not needed anymore. And then we unhash this openowner, we unhash and close all of its stateids. Then the files get closed, and we can unmount FS1. Is the described issue familiar to you? It is very easily reproducible. What is the way to solve it? To our understanding, if we un-export a FS from nfsd, we should be able to unmount it. For example, can we introduce a sysfs or procfs entry that will list all clients and openowners. Then we add another sysfs entry allowing the user to "forget" a particular openowner? If you feel this is the way to move forward, we can try to provide patches for review. Thanks, Alex. > > Asking because I've been considering just deprecating it, so: > > > > After we fixed this, we confirmed that the openowner is not freed > > > prematurely. It is freed by release_openowner() final call > > > to nfs4_put_stateowner(). > > > > > > However, we still get (other) random crashes and memory corruptions > > > when nfsd_inject_forget_client_openowners() and > > > nfsd_inject_forget_openowners(). > > > According to our analysis, we don't see any other refcount issues. > > > Can anybody from the community review these flows for other potentials issues? > > I'm wondering how much effort we want to put into tracking all that > down. > > --b.