On Friday August 1, dwmw2@xxxxxxxxxxxxx wrote: > On Fri, 2008-08-01 at 10:16 +1000, Neil Brown wrote: > > On Thursday July 31, dwmw2@xxxxxxxxxxxxx wrote: > > > On Fri, 2008-05-02 at 11:38 +1000, Neil Brown wrote: > > > > Why is there a deadlock here? > > > > I was really hoping you would answer this question. > > It's because the nfsd readdirplus code recurses into the file system. > >From the file system's ->readdir() function we call back into nfsd's > encode_entry(), which then calls back into the file system's ->lookup() > function so that it can generate a filehandle. > > For any file system which has its own internal locking -- which includes > jffs2, btrfs, xfs, jfs, gfs*, ocfs* and probably others -- that > recursive call back into the file system will deadlock. > > In the specific case of JFFS2, we need that internal locking because of > lock ordering constraints with the garbage collection -- we need to take > the allocator lock _before_ the per-inode lock, which means we can't use > the generic inode->i_mutex for that purpose. That's documented in > fs/jffs2/README.Locking. I know fewer details about the other affected > file systems. It sounds to me like the core problem here is that the locking regime imposed by the VFS simply isn't suitable for JFFS2 .. and a bunch of other filesystems. This seems to suggest that the VFS should be changed. Superficially, it seems that changing the locking rules so that the callee takes i_mutex rather than the caller taking it would help and, in the case of JFFS2, make f->sem redundant. Does that match your understanding? That is obviously a big change and one that should not be made lightly, but if it was to benefit a number of the newer filesystems, then it would seem like the appropriate way to go. Clearly we need a short term solution too as we don't want to wait for VFS locking rules to be renegotiated. The idea of a "lock is owned by me" check is appealing to me as it is a small, localised change that would easily be removed if/when the locking we "fixed" properly. In the JFFS2 case I imagine this taking the following form: - new field in jffs2_inode_info "struct task_struct *sem_owner", initialised to NULL - in jffs2_readdir after locking ->sem, set ->sem_owner to current. - before unlocking ->sem, set ->sem_owner to NULL - in jffs2_lookup, check if "dir_f->sem_owner == current" If it does, set a flag reminding us not to drop the lock, else mutex_lock(&dir_f->sem); This should fix the problem with very little cost either in performance or elegance. And if/when the locking rules were changed, the accompanying code review would immediately notice this and remove it. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html