Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held

Ian Kent <raven@xxxxxxxxxx> · Thu, 26 Mar 2009 11:09:31 +0900

Sage Weil wrote:
> On Thu, 26 Mar 2009, Ian Kent wrote:
>> Ian Kent wrote:
>>> Sage Weil wrote:
>>>> On Tue, 24 Mar 2009, Ian Kent wrote:
>>>>> Ian Kent wrote:
>>>>>> Sage Weil wrote:
>>>>>>> real_lookup() is called by do_lookup() if dentry revalidation fails.  If
>>>>>>> the cache is re-populated while waiting for i_mutex, it may find that
>>>>>>> a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).
>>>>>>>
>>>>>>> Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
>>>>>>> revalidate failed _again_, however, it would give up with -ENOENT.  The
>>>>>>> problem here that network file systems may be invalidating dentries via
>>>>>>> server callbacks, e.g. due to concurrent access from another client, and
>>>>>>> -ENOENT is frequently the wrong answer.
>>>>>> This will be something of a problem for autofs4 (and autofs).
>>>>>> It would require fairly significant changes to the revalidate code.
>>>>> Or maybe not that significant, I'm not sure yet.
>>>> Can you be more specific about your concern?  Since the locking rules 
>>>> aren't really changing (see below), the main thing to worry about is the 
>>>> loss of the wonky -ENOENT error..
>>> Historically, the lock inconsistency has been around for a long time and
>>> autofs(4) has worked around that issue. The situation is that when
>>> autofs(4) needs to send a request back to user space it must be sent
>>> without the mutex held. This callback can be needed in the lookup or in
>>> revalidate, depending on the situation. Currently when the callback is
>>> needed in revalidate, the lock is never held.
>>>
>>> If the lock is now held through revalidate then I will need to move the
>>> lock release/re-acquire from the lookup to the revalidate. My initial
>>> concern was that if we need to release and re-aquire the lock more than
>>> once in the revalidate code the complexity of potential deadlock goes up
>>> considerably. Having put so much effort into dealing with deadlock
>>> issues in lookup with just one release and re-acquire I'm worried I'll
>>> end up with more problems.
>>>
>>> However, it looks like I will only need to do the release/re-acquire
>>> once in revalidate, so maybe the status-quo will be maintained and no
>>> additional issues will surface.
>>>
>> It's just occurred to me that your proposing to hold the mutex over one
>> of the two calls that can occur as a result of do_lookup(). That's not
>> good because then we possibly have mixed locking for the same VFS
>> operation. Last time I tried to deal with this I got into a "am I the
>> lock owner, can I release it" dilemma.
>>
>> So, I guess I'm saying that, to do this we would need to hold the lock
>> for both the calls to revalidate.
> 
> I guess the question is whether consistent revalidate rules from the point 
> of do_lookup() are intended.  There are various other paths (through 
> lookup_hash() and cached_lookup()) that do take the mutex.  Whether you 
> get 'mixed locking' is just a matter of what starting point you choose.
> 
> And as far as I can tell, that's a good thing.  In general, we want an 
> unlocked revalidate to avoid mutex contention during cached lookups.  But 
> we also want locked revalidate for paths where changes are being made to 
> avoid races.  It sounds like autofs was just getting lucky before since 
> none of the operations that use those paths are supported.

It's not due to luck.

> 
> Would it be possible to avoid the upcall in revalidate, and instead fail 
> and let the subsequent lookup path do it?  (I'm guessing the upcall 
> doesn't happen for _every_ revalidate?)

Yes, that's right, just every revalidate from processes that aren't the
automount process itself. The normal case is the mount succeeds and
further walks follow the mount from then on until it expires.

It was more than three years ago when I tried to make everything go
through lookup so my memory is pretty unclear now. In the end I think
there was one case in real_lookup() where the lookup was skipped,
revalidate was called and failed but lookup wasn't then called again and
I got an incorrect failure. AFAICR all other code paths that hold the
mutex over lookup and revalidate perform the revalidate first and then
the lookup if that fails, which avoids this case.

> 
> 
>> But if revalidate returns an error code then it will be returned and so
>> __link_path_walk() will then return that. At least that is what I
>> intended when I did it, but maybe I didn't get it quite right for
>> everyone, for example there would be no call to d_invalidate in that
>> case. Can you perhaps use that mechanism?
> 
> I'm not sure what you mean.  When this problem currently comes up, 
> revalidate doesn't want to return an error... it wants ->lookup() to be 
> called again, but instead real_lookup() is returning -ENOENT.  The obvious 
> alternative of looping (instead of holding i_mutex over the revalidate) 
> would (at least theoretically) be starvation prone.

I just don't understand what you need to resolve.
If returning an error isn't what you need then that's just the way it is.

Ian
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html