On Tue, 16 Sep 2014 08:39:39 -0400 Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> wrote: > On 09/16/2014 01:31 AM, NeilBrown wrote: > > Support for loop-back mounted NFS filesystems is useful when NFS is > > used to access shared storage in a high-availability cluster. > > > > If the node running the NFS server fails, some other node can mount the > > filesystem and start providing NFS service. If that node already had > > the filesystem NFS mounted, it will now have it loop-back mounted. > > > > nfsd can suffer a deadlock when allocating memory and entering direct > > reclaim. > > While direct reclaim does not write to the NFS filesystem it can send > > and wait for a COMMIT through nfs_release_page(). > > Is there anything that can be done on the nfsd side to prevent the deadlocks? > I went down that path first and it didn't work out. Setting PF_FSTRANS in nfsd (when the request comes from localhost) and then arranging the __GFP_FS is cleared when that flag is set overcomes a number of possible deadlock sources, but not all. There are a number of situations where nfsd is waiting on some other thread (which doesn't have PF_FSTRANS set) and that thread tries to reclaim memory and hits nfs_release_page(). It was a long and complex patch set, and nobody liked it. And the common thread was always that it always blocked in nfs_release_page(). So it seemed to make sense to just remove that blockage. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature