Re: [RFC] Reinstate NFS exportability for JFFS2.

Neil Brown <neilb@xxxxxxx> · Fri, 1 Aug 2008 12:14:17 +1000

On Friday August 1, dwmw2@xxxxxxxxxxxxx wrote:
> On Fri, 2008-08-01 at 10:16 +1000, Neil Brown wrote:
> > On Thursday July 31, dwmw2@xxxxxxxxxxxxx wrote:
> > > On Fri, 2008-05-02 at 11:38 +1000, Neil Brown wrote:
> > > > Why is there a deadlock here?
> > 
> > I was really hoping you would answer this question.
> 
> It's because the nfsd readdirplus code recurses into the file system.
> >From the file system's ->readdir() function we call back into nfsd's
> encode_entry(), which then calls back into the file system's ->lookup()
> function so that it can generate a filehandle.
> 
> For any file system which has its own internal locking -- which includes
> jffs2, btrfs, xfs, jfs, gfs*, ocfs* and probably others -- that
> recursive call back into the file system will deadlock.
> 
> In the specific case of JFFS2, we need that internal locking because of
> lock ordering constraints with the garbage collection -- we need to take
> the allocator lock _before_ the per-inode lock, which means we can't use
> the generic inode->i_mutex for that purpose. That's documented in
> fs/jffs2/README.Locking. I know fewer details about the other affected
> file systems.

It sounds to me like the core problem here is that the locking regime
imposed by the VFS simply isn't suitable for JFFS2 .. and a bunch of
other filesystems.

This seems to suggest that the VFS should be changed.
Superficially, it seems that changing the locking rules so that the
callee takes i_mutex rather than the caller taking it would help and,
in the case of JFFS2, make f->sem redundant.  Does that match your
understanding?

That is obviously a big change and one that should not be made
lightly, but if it was to benefit a number of the newer filesystems, then
it would seem like the appropriate way to go.

Clearly we need a short term solution too as we don't want to wait for
VFS locking rules to be renegotiated.  The idea of a "lock is owned by
me" check is appealing to me as it is a small, localised change that
would easily be removed if/when the locking we "fixed" properly.

In the JFFS2 case I imagine this taking the following form:

 - new field in jffs2_inode_info "struct task_struct *sem_owner",
   initialised to NULL
 - in jffs2_readdir after locking ->sem, set ->sem_owner to current.
 - before unlocking ->sem, set ->sem_owner to NULL
 - in jffs2_lookup, check if "dir_f->sem_owner == current"
   If it does, set a flag reminding us not to drop the lock,
   else mutex_lock(&dir_f->sem);

This should fix the problem with very little cost either in
performance or elegance.  And if/when the locking rules were changed,
the accompanying code review would immediately notice this and remove
it.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html