Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 12 Jul 2011 20:56:35 -0700

On Tue, Jul 12, 2011 at 8:22 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> Oh, and another fun area is per-chain locks, of course ;-/  Look at
> __d_drop(); reengineering the callers of d_drop() to make sure that
> fscker's parent stays stable would be very painful and turning that
> into loop-based variant is going to be interesting.  Doable, but not
> fun...

So I'm almost convinced.

The reason I say "almost" is that I wonder if we couldn't just turn
the d_move() into basically a

   write_seqlock(rename_lock);
   write_seqcount_begin(dentry);
   write_seqcount_begin(target);

   unhash(dentry);  /* Takes and releases dentry/dentry-parent locks */
   unhash(target);  /* Takes and releases target/target-parent locks */

   tname = target->name;
   dname = dentry->name;

   rehash(dentry, tname); /* takes/releases locks as above.. */
   rehash(target, dname); /* takes/releases locks as above.. */

   write_seqcount_end(target);
   write_seqcount_end(dentry);
   write_sequnlock(rename_lock);

where the seqlock and sequence counts mean that any lookups (RCU or
not) while the thing was unhashed get automatically re-done, so the
fact that it's not at "atomic" move in between would not important.

See what I'm saying? Sure, we want to hold both the dentry and parent
locks when messing with dentry->d_parent: but does that really mean
that we want to hold all FOUR locks at the same time, and in
particular hold those *independent* locks (which is why we do that
pointer value comparison, to give ordering *outside* of the parenthood
issue)?

IOW, maybe we could just hold and release them pairwise, and at any
one point in time we'd only hold one pair of dentry/parent d_lock? No
dentry-pointer-based ordering required - but we'd still protect each
d_parent pointer _individually_.

But I didn't really look closely at the code. The above suggestion is
purely based on my high-level gut feel, and it's entirely possible
that there's some particular practical reason why it's a totally
broken idea, and I'm just a complete moron. I'm realizing, for
example, that maybe we can't even do the dentry seqcount without
holding d_lock?

It would take the locks a few more times, but it would avoid the nasty
lock ordering issues exactly because it drops them in between rather
than try to hold them all at once. And d_move() is *not* a performance
critical area, afaik, so the point would be to make the locking more
straightforward and avoid the horrible subtlety.

Crazy? Stupid? What am I missing? It's probably something obvious.

                            Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html