On Sat, Jul 30, 2016 at 01:44:48PM -0700, Linus Torvalds wrote: > No. Let's not. "smp_rmb()" is completely free on x86 (ok, so it's a > instruction scheduling barrer - close enough), so trying to optimize > away rmb's and replacing them with double compares sounds entirely > misdesigned. > > Yes, yes, there are other architectures where rmb is much more > expensive. But quite frankly, in most cases those architectures have > broken synchronization to begin with ("synchronization is unusual, so > let's not optimize it"). They'll fix it eventually. > > Instead, what we should look at, is to make raw_seqcount_begin() use a > smp_load_acquire() on architectures where that is cheaper than the > rmb. > > But again, I don't see the point of double-testing "parent" when a > load-acquire or load+rmb _should_ be cheap (and absolutely is on x86). Umm... Even on x86, a lot of hash chain elements will have ->d_parent mismatch. Suppose rmb was a no-op; current code does fetch ->d_seq fetch ->d_parent compare with register branch taken to the end of body while this would avoid the first fetch. On the entries with the same ->d_parent we'd do fetch ->d_parent compare with register branch not taken fetch ->d_seq fetch ->d_parent compare with register branch (expectedly) not taken which is the same as the mainline in terms of actual memory accesses and extra 3 insns. I suspect that the win on the entries with ->d_parent mismatch can outweight that, but that needs profiling to verify. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html