On Fri, Apr 17, 2015 at 08:09:10PM +0100, Al Viro wrote: > On Fri, Apr 17, 2015 at 05:25:36PM +0100, Al Viro wrote: > > On Mon, Mar 23, 2015 at 01:37:40PM +1100, NeilBrown wrote: > > > @@ -1669,13 +1669,14 @@ static inline int nested_symlink(struct path *path, struct nameidata *nd) > > > > > > do { > > > struct path link = *path; > > > + struct inode *inode = link.dentry->d_inode; > > > void *cookie; > > > > > > res = follow_link(&link, nd, &cookie); > > > if (res) > > > break; > > > res = walk_component(nd, path, LOOKUP_FOLLOW); > > > - put_link(nd, &link, cookie); > > > + put_link(nd, &link, inode, cookie); > > > } while (res > 0); > > > > That's really unpleasant - it means increased stack footprint in the > > recursion. > > > > Damn, maybe it's time to bite the bullet and kill the recursion completely... > > > > What do we really need to save across the recursive call? > > * how far did we get in the previous pathname > > * data needed for put_link: > > cookie > > link body > > dentry of link > > vfsmount (to pin containing fs; non-RCU) or inode (RCU) > > > > We are already saving link body in nameidata, so we could fatten that array. > > It would allow flattening link_path_walk() completely - instead of > > recursive call we would just save what needed saving and jump to the beginning > > and on exits we'd check the depth and either return or restore the saved state > > and jump back to just past the place where recursive call used to be. > > It would even save quite a bit of space in the worst case. However, it would > > blow the stack footprint in normal cases *and* blow it even worse for the > > things that need two struct nameidata instances at once (rename(), basically). > > 5 pointers instead of 1 pointer per level - extra 32 words on stack, i.e. > > extra 256 bytes on 64bit. Extra 0.5Kb of stack footprint on rename() is > > probably too much, especially since this "saved" stuff from its two nameidata > > instances will never be used at the same time... > > > > Alternatively, we could just allocate about a page worth of an array when > > the depth of nesting goes beyond 2 and put this saved stuff there - at > > 5 pointers per level it would completely dispose of the depth of nesting > > limit, giving us uniform "can't traverse more than 40 symlinks per pathname > > resolution". 40 * 5 * sizeof(pointer) is what, at most 1600 bytes? So > > even half a page would suffice for that quite comfortably... > > > > The question is whether we'll be able to avoid blowing the I-cache footprint > > of link_path_walk() to hell while doing that; it feels like we should be, > > but we'll have to see how well does that work in reality... > > > > I'll try to implement that (with your #3..#7 as the first steps) and see > > how well does it work; it's obviously the next cycle fodder, but hopefully > > in testable shape by -rc2... > > Hmm... Actually, right now we have 192 bytes of stack footprint per > nesting level (amd64 allmodconfig). Which means that simply making the > array fatter would give a clean benefit at the 3rd level of recursion (symlink > encountered while traversing a symlink) for everything other than rename()... > allnoconfig+64bit gives 160 bytes per level, with the same breakeven point. > > Interesting... It might even make sense to separate that array from > struct nameidata and solve the rename() problem that way (current->nameidata > would be replaced with pointer to that sucker in such variant, of course, and > ->depth would move there). In that variant we do not get rid of nesting limit > completely, but it would probably be simpler than the one above... OK, right now in my tree recursion is gone, it seems to survive the testing *and* stack footprint of link_path_walk() is 432 bytes. Less than the mainline with two nested symlinks, and I definitely see how to trim that down by another 64 bytes, which would put us within a spitting distance from what the mainline gets with a single symlink encountered in the middle of a pathname. It still needs more massage (link_path_walk() is ugly as hell right now), but I see how to clean it up, and porting the rest of Neil's RCU follow_link series on top of that shouldn't be hard. Obviously next cycle fodder, but if everything works out, we'll get serious stack footprint reduction *and* not falling out of lazy pathwalk whenever we run into a symlink. I've dumped the current branch in vfs.git#link_path_walk; more after I get some sleep... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html