Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 30 Jul 2009, Ian Kent wrote:
> Sage Weil wrote:
> > On Wed, 29 Jul 2009, Ian Kent wrote:
> >>> How is it possible for a revalidate to fail and then to skip over the 
> >>> ->lookup() call?  That sounds like a problem with the VFS?  (Like the one 
> >>> I'm trying to fix... you're testing with the real_lookup patch applied, I 
> >>> hope?)
> >> It's been hard to work out exactly what's happening but I suspect it is
> >> due to multiple walks occurring while the the dentry hashed state
> >> changes. Like, one process hits revalidate, drops the dentry, another
> >> process comes along and goes straight to lookup and rehashes the dentry,
> >> the original process, that dropped the dentry, then ends up not calling
> >> lookup. The race only happens occasionally, and my test uses 10
> >> processes that tend to hit the same dentrys at the same time at various
> >> points in the mount tree over 150 iterations. Admittedly, it's a fairly
> >> stressful test as autofs goes but it does tend to show up races quite well.
> > 
> > If I'm reading namei.c correctly, every ->d_revalidate() that returns NULL 
> > will always result in a ->lookup() (unless IS_DEADDIR(inode)), without any 
> > regard for the dentry's hashed/unhashed status.  The exception appears to 
> > be a call in __link_path_walk when FS_REVAL_DOT is set in the superblock 
> > (nfs only):
> > 
> > 		if (nd->path.dentry && nd->path.dentry->d_sb &&
> > 		    (nd->path.dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
> > 			err = -ESTALE;
> > 			/* Note: we do not d_invalidate() */
> > 			if (!nd->path.dentry->d_op->d_revalidate(
> > 					nd->path.dentry, nd))
> > 				break;
> > 
> > in which case userspace would see an ESTALE.
> > 
> > It sounds like a few well placed printks that include the pid for unhash, 
> > rehash, revalidate, and lookup would narrow down what is going on?
> 
> Of course I've been doing that.
> 
> I think you need to re-read what I wrote above. Your analysis is fine if
> there is only one process concurrently walking onto the dentry. If a
> second process beats the revalidate process to real_lookup() and
> incorrectly rehashes the dentry the following revalidate process will
> not call ->lookup().

Sorry, you're right.  I missed the d_lookup call in real_lookup.  So you 
mean something like:

do_lookup
	__d_lookup -> finds dentry
	do_revalidate -> drops dentry, returns null
	real_lookup
		take i_mutex
		d_lookup -> returns dentry rehashed by racing ->lookup()

> My most recent try at this seems to be working.
> I need to clean up the printks and alter one of the comments and re-test
> with a full test (I have been working with only half the test enabled as
> it was most problematic). The testing takes a long time to run, 1 hour
> 10 minutes for each of two configurations and the criteria for success
> in at least six consecutive successful runs, that's about thirteen
> hours. But, hopefully, when that is done I'll have a series to post for
> comment.

Great to hear.  I hope my inane blathering hasn't been totally useless! :)

Thanks-
sage
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux