On Tue, Sep 28, 2021 at 07:11:41AM +0200, Salvatore Bonaccorso wrote: > Hi Bruce, > > On 27.09.2021 17:53, J. Bruce Fields wrote: > >On Mon, Sep 27, 2021 at 08:10:31AM +0200, Salvatore Bonaccorso wrote: > >>We recently got the following traces on a NFS server, but I'm not sure > >>how to further debug this, any hints? > > > >The server creates and opens a file in two steps, though it should > >really be a single atomic operation. > > > >That means there's a small possibility somebody could intervene and do > >something like change the permissions: > > > >> > >>[5746893.904448] ------------[ cut here ]------------ > >>[5746893.910050] nfsd4_process_open2 failed to open > >>newly-created file! status=10008 > > > >10008 is NFS4ERR_DELAY, so maybe somebody managed to get a delegation > >before we finished opening? > > > >We should be able to prevent that.... > > > >In your setup are there processes quickly opening new files created by > >others? > > This is very possible. The NFS server is used as a "scratch" place > accessible from > compute cluster where people can have multiple jobs simultaneously > running through > Slurm and accessing the data. So it is possible that user create new > files from > one running instance and accessing it quickly from the other nodes. > > I'm so far was unable to arificially trigger the issue but is there > anything I > can try out to get more information useful for you? I think the problem's pretty obvious. I'm not sure what the fix should be. You can work around it for now by turning off delegations (echo 0 >/proc/sys/fs/leases_enable before starting nfsd). --b.