Sage Weil wrote: > On Wed, 24 Jun 2009, Ian Kent wrote: >> Ian Kent wrote: >>> Sage Weil wrote: >>>> Hi Ian, >>>> >>>> Have you had a chance to look at getting autofs4 lookup/revalidate >>>> adjusted so that this real_lookup() fix[1] can go in? >>>> >>>> Please let me know if there is anything I can do to help here. If you're >>>> still occupied, I'm happy to spin something up and send it your way... >>>> just let me know. >>> Sorry, I haven't had time to do more on this. >>> There is also the issue of what to do about removing the autofs module >>> and renaming autofs4 to autofs, as this will break the autofs module. >>> >>> I did start contacting people I think would want to know about this but >>> haven't gone further than an initial mail. >>> >>> The other thing is that this patch was originally written quite a while >>> ago and, although it appears to work ok, I'm not sure it's quite what we >>> need. >> I'm continuing with this now, but there's a deadlock in there somewhere! > > Sorry, are you still working with the patch you posted a few months back? > > http://marc.info/?l=linux-fsdevel&m=123831685111213&w=2 > > Looking over it, the > > + unsigned int lock_held = mutex_is_locked(&dir->i_mutex); > ... > + if (lock_held) { > + /* Already pending, send to ->lookup() */ > + d_drop(dentry); > > bit looks highly suspect. I'm guessing revalidate should never sleep, and > always kick things off to ->lookup() (to do any waiting on upcall > completion or whatever else) if the dentry isn't valid now...? I tried your suggestion and have finally come to the conclusion that it cannot work. My own fault really, for not fully understanding why I used the above approach in the first place. I believe that if the mutex is not held then I "must" handle it in the revalidate routine and if the mutex is held I "must" defer to ->lookup(). The only way to send this to ->lookup() is to drop the dentry and rehash it in lookup and the mutex must be held over both calls or it is possible for an execution path to skip over the lookup call when several concurrent processes walk into the same dentry at the same time. AFAICS it isn't possible to detect this and work around it when sending everything to ->lookup() This digression was quite costly in time for me but useful in improving my understanding of the problem. I'm going to return to my original approach, hopefully I will make better progress. Ian -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html