Re: Linux 2.6.26-rc4

Ian Kent <raven@xxxxxxxxxx> · Wed, 04 Jun 2008 01:38:16 +0800

On Tue, 2008-06-03 at 18:30 +0100, Al Viro wrote:
> On Wed, Jun 04, 2008 at 01:13:08AM +0800, Ian Kent wrote:
> 
> > "What happens is that during an expire the situation can arise
> > that a directory is removed and another lookup is done before
> > the expire issues a completion status to the kernel module.
> > In this case, since the the lookup gets a new dentry, it doesn't
> > know that there is an expire in progress and when it posts its
> > mount request, matches the existing expire request and waits
> > for its completion. ENOENT is then returned to user space
> > from lookup (as the dentry passed in is now unhashed) without
> > having performed the mount request.
> > 
> > The solution used here is to keep track of dentrys in this
> > unhashed state and reuse them, if possible, in order to
> > preserve the flags. Additionally, this infrastructure will
> > provide the framework for the reintroduction of caching
> > of mount fails removed earlier in development."
> > 
> > I wasn't able to do an acceptable re-implementation of the negative
> > caching we had in 2.4 with this framework, so just ignore the last
> > sentence in the above description. 
> 
> > Unfortunately no, but I thought that once the dentry became unhashed
> > (aka ->rmdir() or ->unlink()) it was invisible to the dcache. But, of
> > course there may be descriptors open on the dentry, which I think is the
> > problem that's being pointed out.
>  
> ... or we could have had a pending mount(2) sitting there with a reference
> to mountpoint-to-be...
> 
> > Yes, that would be ideal but the reason we arrived here is that, because
> > we must release the directory mutex before calling back to the daemon
> > (the heart of the problem, actually having to drop the mutex) to perform
> > the mount, we can get a deadlock. The cause of the problem was that for
> > "create" like operations the mutex is held for ->lookup() and
> > ->revalidate() but for a "path walks" the mutex is only held for
> > ->lookup(), so if the mutex is held when we're in ->revalidate(), we
> > could never be sure that we where the code path that acquired it.
> > 
> > Sorry, this last bit is unclear.
> > I'll need to work a bit harder on the explanation if you're interested
> > in checking further.
> 
> I am.
> 
> Oh, well...  Looks like RTFS time for me for now...  Additional parts of
> braindump would be appreciated - the last time I've seriously looked at
> autofs4 internal had been ~2005 or so ;-/

You will find other problems.

The other bit to this is the patch to resolve the deadlock issue I spoke
about just above. This is likely where most of the current problems
started and the fact that we have always had to drop the mutex to call
back the daemon.

I can post the patches as well if that helps.

The description accompanying that patch was (the inconsistent locking
referred to here is what was described above):

"Due to inconsistent locking in the VFS between calls to lookup and
revalidate deadlock can occur in the automounter.

The inconsistency is that the directory inode mutex is held for both
lookup and revalidate calls when called via lookup_hash whereas it is
held only for lookup during a path walk. Consequently, if the mutex
is held during a call to revalidate autofs4 can't release the mutex
to callback the daemon as it can't know whether it owns the mutex.

This situation happens when a process tries to create a directory
within an automount and a second process also tries to create the
same directory between the lookup and the mkdir. Since the first
process has dropped the mutex for the daemon callback, the second
process takes it during revalidate leading to deadlock between the
autofs daemon and the second process when the daemon tries to create
the mount point directory.

After spending quite a bit of time trying to resolve this on more than
one occassion, using rather complex and ulgy approaches, it turns out
that just delaying the hashing of the dentry until the create operation
work fine."

Ian

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html