Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Sat, Aug 15, 2015 at 2:07 PM, Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: >> >> Yes we can compare s_root and mnt_root and only call is_subir if they don't match. > > Not even "is_subdir()" - for the RCU traversal case, just d_ancestor() > should be sufficient since we'd already be in an RCU read-locked > region and the RCU lookup checks the rename sequence number around it > all. We check the dentry sequence number and the mount sequence number, which may be enough to catch a local rename but is certainly not enough to catch what d_ancestor cares about. Further we have the partial rcu to non-rcu walk case represented by unlazy_walk that means we can't blithely do something that might be wrong and only check the sequence numbers at each step. > And d_ancestor() should really be pretty low-cost - even *if* we have > to call it, which wouldn't even be the case for the normal situation. > >> At this point it is a matter of trade offs. >> >> If there is not an escape I do not expect my current implementation will have a measurable cost. >> And I don't expect there will be any escapes. > > So the cost I worry about is not the CPU cost, but the complexity and > correctness. If anything goes subtly wrong, the end result is going to > be some very very subtle bugs. Fair enough. I like simple low complexity code, but I don't want to mess up the pathname lookup fastpath. > And personally, I'd be much happier with something that is a bit more > straightforward, even if it makes ".." lookup slower. Especially since > I think we can limit the costs to fairly obvious cases (ie only for > partial bind mounts). Keep the code more straightforward, and *if* we > ever see the cost of dentry traversal > > But it's up to Al, I think. > > Al, comments? At the very beginning of this I got shot down by Al Viro for a simple implementation that essentially had everything except the check for being a bind mount. Knowing what I know now I realize it was a bit buggy, calling d_ancestor in the rcu walk instead of d_subdir, but it was shot down for the cpu cost. Then Al suggested the basic approach I have taken in these patches. As soon as I am done testing I am going to post the revised version of my final patch that only performs is_subdir checks on bind mounts. Then we can decide to merge whichever version of the code you and Al are happy with. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers