Sage Weil wrote: > On Tue, 24 Mar 2009, Ian Kent wrote: >> Ian Kent wrote: >>> Sage Weil wrote: >>>> real_lookup() is called by do_lookup() if dentry revalidation fails. If >>>> the cache is re-populated while waiting for i_mutex, it may find that >>>> a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment). >>>> >>>> Previously, real_lookup() would drop i_mutex and do_revalidate() again. If >>>> revalidate failed _again_, however, it would give up with -ENOENT. The >>>> problem here that network file systems may be invalidating dentries via >>>> server callbacks, e.g. due to concurrent access from another client, and >>>> -ENOENT is frequently the wrong answer. >>> This will be something of a problem for autofs4 (and autofs). >>> It would require fairly significant changes to the revalidate code. >> Or maybe not that significant, I'm not sure yet. > > Can you be more specific about your concern? Since the locking rules > aren't really changing (see below), the main thing to worry about is the > loss of the wonky -ENOENT error.. Historically, the lock inconsistency has been around for a long time and autofs(4) has worked around that issue. The situation is that when autofs(4) needs to send a request back to user space it must be sent without the mutex held. This callback can be needed in the lookup or in revalidate, depending on the situation. Currently when the callback is needed in revalidate, the lock is never held. If the lock is now held through revalidate then I will need to move the lock release/re-acquire from the lookup to the revalidate. My initial concern was that if we need to release and re-aquire the lock more than once in the revalidate code the complexity of potential deadlock goes up considerably. Having put so much effort into dealing with deadlock issues in lookup with just one release and re-acquire I'm worried I'll end up with more problems. However, it looks like I will only need to do the release/re-acquire once in revalidate, so maybe the status-quo will be maintained and no additional issues will surface. > > sage > > >>>> This problem has been seen with both Lustre and Ceph. It seems possible >>>> to hit this case with NFS as well if the cache lifetime is very short. >>>> >>>> Instead, we should do_revalidate() while i_mutex is still held. If >>>> revalidation fails, we can move on to a ->lookup() and ensure a correct >>>> result without worrying about any subsequent races. >>>> >>>> Note that do_revalidate() is called with i_mutex held elsewhere. For >>>> example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(), >>>> and possibly others all take the directory i_mutex, and then >>>> >>>> -> lookup_hash >>>> -> __lookup_hash >>>> -> cached_lookup >>>> -> do_revalidate >>>> >>>> so this does not introduce any new locking rules for d_revalidate >>>> implementations. >>>> >>>> Yes, the goto is ugly. A cleanup patch follows. >>>> >>>> CC: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >>>> CC: Andreas Dilger <adilger@xxxxxxx> >>>> Signed-off-by: Yehuda Sadeh <yehuda@xxxxxxxxxxxx> >>>> Signed-off-by: Sage Weil <sage@xxxxxxxxxxxx> >>>> --- >>>> fs/namei.c | 5 +++-- >>>> 1 files changed, 3 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/fs/namei.c b/fs/namei.c >>>> index c30e33d..b9e7128 100644 >>>> --- a/fs/namei.c >>>> +++ b/fs/namei.c >>>> @@ -489,6 +489,7 @@ static struct dentry * real_lookup(struct dentry * >>>> parent, struct qstr * name, s >>>> if (!result) { >>>> struct dentry *dentry; >>>> +do_the_lookup: >>>> /* Don't create child dentry for a dead directory. */ >>>> result = ERR_PTR(-ENOENT); >>>> if (IS_DEADDIR(dir)) >>>> @@ -512,12 +513,12 @@ out_unlock: >>>> * Uhhuh! Nasty case: the cache was re-populated while >>>> * we waited on the semaphore. Need to revalidate. >>>> */ >>>> - mutex_unlock(&dir->i_mutex); >>>> if (result->d_op && result->d_op->d_revalidate) { >>>> result = do_revalidate(result, nd); >>>> if (!result) >>>> - result = ERR_PTR(-ENOENT); >>>> + goto do_the_lookup; >>>> } >>>> + mutex_unlock(&dir->i_mutex); >>>> return result; >>>> } >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html