On Fri, Sep 06, 2013 at 05:58:51PM -0700, Linus Torvalds wrote: > On Fri, Sep 6, 2013 at 5:19 PM, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > (We're bounded in practice by PATH_MAX, so you can't make getcwd() > > traverse more than about 2000 parents (single character filename plus > > the slash for each level), and for all I know filesystems might cap it > > before that, so it's not unbounded, but the difference between "1" and > > "2000" is pretty damn big) > > .. in particular, it's big enough that one is pretty much guaranteed > to fit in any reasonable L1 cache (if we have dentry hash chains so > long that that becomes a problem for traversing a single chain, we're > screwed anyway), while the other can most likely be a case of "not a > single L1 cache hit because by the time you fail and go back to the > start, you've flushed the L1 cache". > > Now, whether 2000 L2 cache misses is long enough to give people a > chance to run the whole rename system call path in a loop a few times, > I don't know, but it sure as heck sounds likely. > > Of course, you might still ask "why should we even care?" At least > without preemption, you might be able to trigger some really excessive > latencies and possibly a watchdog screaming at you as a result. But > that said, maybe we wouldn't care. I just think that the solution is > so simple (what, five extra lines or so) that it's worth avoiding even > the worry. We already have that kind of logics - see select_parent() et.al. in mainline or d_walk() in vfs.git#for-linus (pull request will go in a few minutes). With this patch we get * plain seqretry loop (d_lookup(), is_subdir(), autofs4_getpath(), ceph_misc_build_path(), [cifs] build_path_from_dentry(), nfs_path(), [audit] handle_path()) * try seqretry once, then switch to write_seqlock() (the things that got unified into d_walk()) * try seqretry three times, then switch to write_seqlock() (d_path() and friends) * several pure write_seqlock() users (d_move(), d_set_mounted(), d_materialize_unique()) The last class is not a problem - these we want as writers. I really don't like the way the rest is distributed - if nothing else, nfs_path() and friends are in exactly the same situation as d_path(). Moreover, why the distinction between "try once" and "try thrice"? _If_ we fold the second and the third groups together (and probably have a bunch from the first one join that), we at least get something understandable, but the I really wonder if seqlock has the right calling conventions for that (and at least I'd like to fold the "already got writelock" flag into seq - we do have a spare bit there). Comments? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html