Sage Weil wrote: > On Wed, 24 Jun 2009, Ian Kent wrote: >> Ian Kent wrote: >>> Sage Weil wrote: >>>> Hi Ian, >>>> >>>> Have you had a chance to look at getting autofs4 lookup/revalidate >>>> adjusted so that this real_lookup() fix[1] can go in? >>>> >>>> Please let me know if there is anything I can do to help here. If you're >>>> still occupied, I'm happy to spin something up and send it your way... >>>> just let me know. >>> Sorry, I haven't had time to do more on this. >>> There is also the issue of what to do about removing the autofs module >>> and renaming autofs4 to autofs, as this will break the autofs module. >>> >>> I did start contacting people I think would want to know about this but >>> haven't gone further than an initial mail. >>> >>> The other thing is that this patch was originally written quite a while >>> ago and, although it appears to work ok, I'm not sure it's quite what we >>> need. >> I'm continuing with this now, but there's a deadlock in there somewhere! > > Sorry, are you still working with the patch you posted a few months back? It had changed a little but is quite different now. I have a somewhat better stress test now so things that don't work will pop out. > > http://marc.info/?l=linux-fsdevel&m=123831685111213&w=2 > > Looking over it, the > > + unsigned int lock_held = mutex_is_locked(&dir->i_mutex); > ... > + if (lock_held) { > + /* Already pending, send to ->lookup() */ > + d_drop(dentry); > > bit looks highly suspect. I'm guessing revalidate should never sleep, and > always kick things off to ->lookup() (to do any waiting on upcall > completion or whatever else) if the dentry isn't valid now...? Yeah, I've heard that before, ;) And that maybe the case, but that was what I first had. Sending everything to ->lookup() might be possible but it certainly isn't that simple. Waiting in ->d_revaidate() isn't that different to waiting in ->lookup() anyway as that must always be done without the directory mutex held. If the lock isn't held when in ->d_revalidate() I can't really see any reason not to handle that right their, possibly preventing the need to go to ->lookup(). There are several cases I need to deal with, apart from path walks initiated by the daemon which don't cause any call backs, and so are largely handled by trivially returning success. The cases are, an expiring dentry that will go away which ->lookup() can't yet handle, an expiring dentry that won't go away which ->lookup() should be able to handle already, and a straight out mount request which ->lookup() should also be able to handle. The tail end of the expire cases can progress concurrently with a mount, which is further complicated by the two cases of going away or not, so it's all a bit tricky. In any case I need to get this to work without the change you proposed, except for cases that result from the locking change, and I'm using printks to track incorrect returns to identify those cases. So what I need right now is consistent behaviour and I'm not quite there. Once I have that I'll work on any issues resulting from the locking change. A lot has changed in the autofs4 module since I first tried to do this and I now have a fairly aggressive test, so what appeared to work before actually doesn't and it isn't as straight forward as I hoped. Ian -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html