On Fri, Feb 19 2016, Theodore Ts'o <tytso@xxxxxxx> wrote: > On Thu, Feb 18, 2016 at 09:10:21PM +0100, Rasmus Villemoes wrote: >> >> Sure, that would work as well. I don't really care how ->iname is pushed >> out to offset 32, but I'd like to know if it's worth it. > > Do you have access to one of these platforms where unaligned access is > really painful? No. But FWIW, I did a microbenchmark on my aging Core2, doing nothing but lstat() on the same "aaaa..." string in a loop. 'before' is 4.4.2 with a few unrelated patches, 'after' is that plus 1/2 and 2/2. In perf_x_y, x is length of "aaa..." string and y is alignment mod 8 in userspace. $ grep strncpy_from_user *.report perf_30_0_after.report: 5.47% s_f_u [k] strncpy_from_user perf_30_0_before.report: 7.40% s_f_u [k] strncpy_from_user perf_30_3_after.report: 5.05% s_f_u [k] strncpy_from_user perf_30_3_before.report: 7.29% s_f_u [k] strncpy_from_user perf_30_4_after.report: 4.88% s_f_u [k] strncpy_from_user perf_30_4_before.report: 7.28% s_f_u [k] strncpy_from_user perf_30_6_after.report: 5.43% s_f_u [k] strncpy_from_user perf_30_6_before.report: 6.74% s_f_u [k] strncpy_from_user perf_40_0_after.report: 5.68% s_f_u [k] strncpy_from_user perf_40_0_before.report: 10.99% s_f_u [k] strncpy_from_user perf_40_3_after.report: 5.37% s_f_u [k] strncpy_from_user perf_40_3_before.report: 10.62% s_f_u [k] strncpy_from_user perf_40_4_after.report: 5.61% s_f_u [k] strncpy_from_user perf_40_4_before.report: 10.91% s_f_u [k] strncpy_from_user perf_40_6_after.report: 5.81% s_f_u [k] strncpy_from_user perf_40_6_before.report: 10.84% s_f_u [k] strncpy_from_user perf_50_0_after.report: 6.29% s_f_u [k] strncpy_from_user perf_50_0_before.report: 12.46% s_f_u [k] strncpy_from_user perf_50_3_after.report: 7.15% s_f_u [k] strncpy_from_user perf_50_3_before.report: 14.09% s_f_u [k] strncpy_from_user perf_50_4_after.report: 7.64% s_f_u [k] strncpy_from_user perf_50_4_before.report: 14.10% s_f_u [k] strncpy_from_user perf_50_6_after.report: 7.30% s_f_u [k] strncpy_from_user perf_50_6_before.report: 14.10% s_f_u [k] strncpy_from_user perf_60_0_after.report: 6.81% s_f_u [k] strncpy_from_user perf_60_0_before.report: 13.25% s_f_u [k] strncpy_from_user perf_60_3_after.report: 9.48% s_f_u [k] strncpy_from_user perf_60_3_before.report: 13.26% s_f_u [k] strncpy_from_user perf_60_4_after.report: 9.90% s_f_u [k] strncpy_from_user perf_60_4_before.report: 15.09% s_f_u [k] strncpy_from_user perf_60_6_after.report: 9.91% s_f_u [k] strncpy_from_user perf_60_6_before.report: 13.85% s_f_u [k] strncpy_from_user So the numbers vary and it's a bit odd that some of the userspace-unaligned cases seem faster than the corresponding aligned ones, but overall I think it's ok to conclude there's a measurable difference. Note the huge jump from 30_y_before to 40_y_before. I suppose that's because we do an unaligned store crossing a cache line boundary when the string is > 32 bytes. I suppose 2/2 is also responsible for some of the above, since it not only aligns the kernel-side stores, but also means we stay within a single cacheline for strings up to 56 bytes. I should measure the effect of 1/2 by itself, but compiling a kernel takes forever for me, so I won't get to that tonight. [It turns out that 32 is the median length from 'git ls-files' in the kernel tree, with 33.2 being the mean, so even though I used relatively long paths above to get strncpy_from_user to stand out, such path lengths are not totally uncommon.] > The usual thing is to benchmark something like "git > stat" which has to stat every single file in a repository's working > directory. I tried that as well; strncpy_from_user was around 0.5% both before and after. Rasmus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html