On Fri, Jan 14, 2011 at 2:12 AM, David Howells <dhowells@xxxxxxxxxx> wrote: > Nick Piggin <npiggin@xxxxxxxxx> wrote: > >> > I would still prefer to see a .follow_mount API, and not tie in this >> > automount specific inode detail to what could be a more flexible >> > dentry-only API. >> > >> > The default NULL implementation would do nothing, and follow_automount >> > stuff can be in fs/libfs.c to be used by filesystem, rather than >> > fs/namei.c. >> >> Looking further in the patchset at the d_managed "thing", that's almost what >> I'm getting at. >> >> But I don't see why any of this stuff has to happen in fs/namei.c. Just call >> the function from path walk, and provide helpers in libfs or something if >> there is a lot of common code between autofs4 and others (and leave it autofs >> specifc when that is the case). >> >> Of course, that would be the obvious and naive first approach. So really my >> question is why did that not work? And can we make it work? > > You have a strange idea of what is 'obvious and naive'. These are parts of > pathwalk, and as such should be in fs/namei.c. I'd rather not expose > pathwalking directly to the filesystem, though I acknowledge that sometimes it > is necessary to let the filesystem influence it. No, path walk would *not* be affected or implemented at all by the filesystem, if you read what I wrote. The only difference is a function called at follow_mount time, which may decide to mount something on there. All the autofs stuff would be constrained within auto mounting filesystems and libraries. follow_automount belongs in automount code, not path walking. Also, I don't like how you return a vfsmount * and use that in path walking. If you just returned success and allowed the path walk to continue looking up mounted directories, it would be cleaner code, IMO. > You need to consider d_automount() and d_manage() separately as they provide > two quite different hooks with different properties. > > Firstly, d_automount(). The following are my points of consideration. > > (0) You're only allowed to automount on a directory. > > (1) Automounting does not need to be done when we follow ".." to an automount > point. > > (2) Automount points that are mounted upon within the current namespace can > just be skipped over. This is the fast path. > > (3) All the filesystem should need as a parameter to determine what it is > allowed to mount is the inode and dentry of the automount point. This > holds true for all the things that currently do automounting (AFS, CIFS, > NFS, autofs). > > (4) All the filesystem should need to do is set up a vfsmount struct and > publish it or return an indication that there was a collision and the > transit should be retried. > > (5) The filesystem is expected to sleep to achieve the automount, so > spinlocks, RCU locks, preemption disablements or interrupt disablements > may not be held across this function. > > (6) The filesystem is expected to need a write lock on namespace_sem at some > point, so this must not be held across the call to d_automount(). > > (7) The filesystem won't necessarily be calling do_add_mount() itself in > d_automount() - in autofs's case, the construction is performed by the > userspace daemon and then autofs4_d_automount() indicates a collision - so > we can't move the do_add_mount() to the caller. Additionally, the > filesystem may want to use an expiration list. > > (8) There needs to be some limitation in place to prevent runaway > automounting. The ELOOP detection mechanism can be used for this. I don't know why the dentry d_automount routine has to depend on inodes at all. That should all be stuffed into the filesystem. > Taking these considerations, it shows that a small amount of code can be > inserted into pathwalk and used for everything. However, having worked with > Ian to try and get autofs4 to work with this, we came up with d_manage() to add > in a missing piece. > > Note that autofs4 also uses d_automount() to build directory structures behind > the mountpoint rather than mounting on the mountpoint. In this case, it clears > the AUTOMOUNT flag when construction is complete. And all of this can be done in autofs4 private flags. > I've allowed d_automount() to return -EISDIR to follow_automount() to indicate > that no mount should be attempted here, and the directory should be given back > to pathwalk to treat as a directory. This allows autofs's daemon access to the > directory. > > Having follow_automount() update the path it has been given with the new > vfsmount and root dentry is purely an optimisation; we could instead simply > return and __follow_mount() will do lookup_mnt() again as it would if a > collision is reported. > > In answer to why I haven't made __follow_mount_rcu() handle automount points, I > thought previously I saw a reason why it was unnecessary, but now I'm not so > sure. It may be that if there are child objects of this dentry then it will > walk onto those rather than automounting - but for some reason it seems still > to automount rather than doing that. > > > Secondly, d_manage(). The following are the points of consideration: > > (1) A filesystem may want to hold up client processes that want to transit > from directories in its control during pathwalk - such as when autofs is > letting its userspace daemon tear down the stuff mounted on or created > behind a directory. > > (2) A transit may be from a directory to a directory mounted over it, or from > a directory to an object (file, dir, etc.) pointed to by an entry in that > directoy. > > (3) The management of dentries in this fashion is a transient affair. > > (4) The mode in which the filesystem is normally entered for this purpose > should be disabled as soon as possible, though it may be reenabled later > if needed. > > (5) When the filesystem is ready it should let the held processes proceed or > it may reject them. > > (6) The filesystem is expected to sleep to achieve this, so spinlocks, RCU > locks, preemption disablements or interrupt disablements may not be held > across this function. > > (7) The filesystem must be able to let some processes through whilst holding > up others. This allows autofs to let its daemon pass to construct or > destroy stuff. So if you change .d_automount as I suggested above to just return success and do "something", then it would seem to handle all these cases just as well. > (8) Because do_add_mount() and do_move_mount() transit through piles of > mounted directories, the filesystem may have to contend with being called > with namespace_sem held. This means initiating (un)mounting, even > indirectly via userspace, is not permitted from this function. With this little detail still to take care of. So as far as I can see, they're not completely different but very similar in that the filesystem wants a hook before mounts are followed (better name might be .d_follow_mount). > Taking the above into consideration, we determined that we couldn't use one > entry point for both automounting and holding up. Autofs had the possibility > to deadlock against itself because of (8). I'm not against 2 entry points, or a parameter to distinguish the cases. That seems like a trivial detail compared with the main issues. >> Unless there are really fundamental reasons why rcu-walk won't work (eg. >> ->lookup, if it is required to do allocations and IO), or if it really >> doesn't matter (eg. a function to be called once to set up a mount, and never >> again). > > d_automount() is expected to sleep, so there's no point not cancelling rcu-walk > mode before entering it. Granted there will be rare occasions when the > cancellation proves unnecessary because there was a mount collision, but even > then it's very likely we'll be calling alloc_vfsmnt() - which may sleep - and > doing I/O - which may also sleep - before we realise we've got a collision. > > Besides, d_automount() is skipped without cancelling rcu-walk mode if the > directory is mounted over. That is the fast path, and I'd rather not route > that through the filesystem. Fair enough. If it becomes a more general .d_follow(), then allowing rcu-walk is probably a good idea. > So after automounting has happened, all subsequent transits of that mountpoint > will maintain rcu-walk mode, assuming they don't meet DCACHE_MANAGE_TRANSIT. > > > I grant that I could extend RCU pathwalk through d_manage(). The autofs daemon > might benefit from that, but I contend that the benefit would be minor overall, > and any processes that do get held up will have to abandon RCU pathwalk so that > they can sleep. We'll see. Relatively minor detail to change the API for that case after we agree on the larger form of the API. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html