On Mon, 4 May 2015 06:11:29 +0100 Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Fri, Apr 24, 2015 at 02:42:03PM +0100, Al Viro wrote: > > > That avoids this spin_lock() on each absolute symlink at the price of extra > > 32 bits in struct nameidata. It looks like doing on-demand reallocation > > of nd->stack is the right way to go anyway, so the pressure on nameidata size > > is going to be weaker and that might be the right way to go... > > OK, on-demand reallocation is done. What I have right now is > * flat MAXSYMLINKS 40, no matter what kind of nesting there might > be. > * purely iterative link_path_walk(). > * no damn nameidata on stack for generic_readlink() > * stack footprint of the entire thing independent from the nesting > depth, and about on par with "no symlinks at all" case in mainline. > * some massage towards RCU follow_link done (in the end of queue), > but quite a bit more work remains. > > What I've got so far is in vfs.git#link_path_walk; I'm not too happy about > posting a 70-chunk mailbomb, but it really will need review and testing. > It survives xfstests and LTP with no regressions, but it will need > serious profiling, etc., along with RTFS. I tried to keep it in reasonably > small pieces, but there's a lot of them ;-/ > > FWIW, I've a bit more reorganization plotted out, but it's not far from > where we need to be for RCU follow_link. Some notes: > * I don't believe we want to pass flags to ->follow_link() - it's > much simpler to give the damn thing NULL for dentry in RCU case. In *all* > cases where we might have a change to get the symlink body without blocking > we can do that by inode alone. We obviously want to pass dentry and inode > separately (and in case of fast symlinks we don't hit the filesystem at > all), but that's it - flags isn't needed. > * terminate_walk() should do bulk put_link(). So should the > failure cases of complete_walk(). _Success_ of complete_walk() should > be careful about legitimizing links - it *can* be called with one link > on stack, and be followed by access to link body. Yes, really - do_last() > in O_CREAT case. > * do_last(), lookup_last() and mountpoint_last() ought to > have put_link() done when called on non-empty stack (thus turning the loops > into something like > while ((err = lookup_last(nd)) > 0) { > err = trailing_symlink(nd); > if (err) > break; > } > _After_ the point where they don't need to look at the last component of > name, obviously. > * I think we should leave terminate_walk() to callers in failure > cases of walk_component() and handle_dots(), as well as get_link(). Makes > life simpler in callers, actually. I'll play with that a bit more. > * it might make sense to add the second flag to walk_component(), > in addition to LOOKUP_FOLLOW, meaning "do put_link() once you are done looking > at the name". In principle, it promises simpler logics with unlazy_walk(), > but that's one area I'm still not entirely sure about. Will need to > experiment a bit... > * nd->seq clobbering will need to be dealt with, as discussed upthread. > * I _really_ hate your "let's use the LSB of struct page * to tell > if we need to kunmap()" approach. It's too damn ugly to live. _And_ it's > trivial to avoid - all we need is to have (non-lazy) page_follow_link_light() > and page_symlink() to remove __GFP_HIGHMEM from inode->i_mapping before > ever asking to allocate pages there. That'll suffice, and it makes sense > regardless of RCU work - that kmap/kunmap with potential for minutes in > between (while waiting for stuck NFS server in the middle of symlink traversal) > is simply wrong. Thanks! I'll have another look and see about adding what is needed for RCU symlink support. NeilBrown
Attachment:
pgpD89nW6DZ27.pgp
Description: OpenPGP digital signature