Re: [PATCH 1/2] vfs: make sure struct filename->iname is word-aligned

Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx> · Tue, 23 Feb 2016 00:59:57 +0100

On Fri, Feb 19 2016, Theodore Ts'o <tytso@xxxxxxx> wrote:

> On Thu, Feb 18, 2016 at 09:10:21PM +0100, Rasmus Villemoes wrote:
>> 
>> Sure, that would work as well. I don't really care how ->iname is pushed
>> out to offset 32, but I'd like to know if it's worth it.
>
> Do you have access to one of these platforms where unaligned access is
> really painful?

No. But FWIW, I did a microbenchmark on my aging Core2, doing nothing
but lstat() on the same "aaaa..." string in a loop. 'before' is 4.4.2
with a few unrelated patches, 'after' is that plus 1/2 and 2/2. In
perf_x_y, x is length of "aaa..." string and y is alignment mod 8 in
userspace.

$ grep strncpy_from_user *.report
perf_30_0_after.report:     5.47%  s_f_u    [k] strncpy_from_user            
perf_30_0_before.report:     7.40%  s_f_u    [k] strncpy_from_user         
perf_30_3_after.report:     5.05%  s_f_u    [k] strncpy_from_user             
perf_30_3_before.report:     7.29%  s_f_u    [k] strncpy_from_user          
perf_30_4_after.report:     4.88%  s_f_u    [k] strncpy_from_user            
perf_30_4_before.report:     7.28%  s_f_u    [k] strncpy_from_user        
perf_30_6_after.report:     5.43%  s_f_u    [k] strncpy_from_user            
perf_30_6_before.report:     6.74%  s_f_u    [k] strncpy_from_user        
perf_40_0_after.report:     5.68%  s_f_u    [k] strncpy_from_user            
perf_40_0_before.report:    10.99%  s_f_u    [k] strncpy_from_user        
perf_40_3_after.report:     5.37%  s_f_u    [k] strncpy_from_user            
perf_40_3_before.report:    10.62%  s_f_u    [k] strncpy_from_user        
perf_40_4_after.report:     5.61%  s_f_u    [k] strncpy_from_user            
perf_40_4_before.report:    10.91%  s_f_u    [k] strncpy_from_user        
perf_40_6_after.report:     5.81%  s_f_u    [k] strncpy_from_user            
perf_40_6_before.report:    10.84%  s_f_u    [k] strncpy_from_user          
perf_50_0_after.report:     6.29%  s_f_u    [k] strncpy_from_user            
perf_50_0_before.report:    12.46%  s_f_u    [k] strncpy_from_user        
perf_50_3_after.report:     7.15%  s_f_u    [k] strncpy_from_user            
perf_50_3_before.report:    14.09%  s_f_u    [k] strncpy_from_user                   
perf_50_4_after.report:     7.64%  s_f_u    [k] strncpy_from_user            
perf_50_4_before.report:    14.10%  s_f_u    [k] strncpy_from_user        
perf_50_6_after.report:     7.30%  s_f_u    [k] strncpy_from_user            
perf_50_6_before.report:    14.10%  s_f_u    [k] strncpy_from_user        
perf_60_0_after.report:     6.81%  s_f_u    [k] strncpy_from_user            
perf_60_0_before.report:    13.25%  s_f_u    [k] strncpy_from_user         
perf_60_3_after.report:     9.48%  s_f_u    [k] strncpy_from_user            
perf_60_3_before.report:    13.26%  s_f_u    [k] strncpy_from_user         
perf_60_4_after.report:     9.90%  s_f_u    [k] strncpy_from_user            
perf_60_4_before.report:    15.09%  s_f_u    [k] strncpy_from_user          
perf_60_6_after.report:     9.91%  s_f_u    [k] strncpy_from_user            
perf_60_6_before.report:    13.85%  s_f_u    [k] strncpy_from_user        

So the numbers vary and it's a bit odd that some of the
userspace-unaligned cases seem faster than the corresponding aligned
ones, but overall I think it's ok to conclude there's a measurable
difference.

Note the huge jump from 30_y_before to 40_y_before. I suppose that's
because we do an unaligned store crossing a cache line boundary when the
string is > 32 bytes.

I suppose 2/2 is also responsible for some of the above, since it not
only aligns the kernel-side stores, but also means we stay within a
single cacheline for strings up to 56 bytes. I should measure the effect
of 1/2 by itself, but compiling a kernel takes forever for me, so I
won't get to that tonight.

[It turns out that 32 is the median length from 'git ls-files' in the
kernel tree, with 33.2 being the mean, so even though I used relatively
long paths above to get strncpy_from_user to stand out, such path
lengths are not totally uncommon.]

> The usual thing is to benchmark something like "git
> stat" which has to stat every single file in a repository's working
> directory.

I tried that as well; strncpy_from_user was around 0.5% both before and
after.

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html