Re: [RFC PATCH] x86/64: Optimize the effective instruction cache footprint of kernel functions

Ingo Molnar <mingo@xxxxxxxxxx> · Thu, 21 May 2015 15:28:18 +0200

* Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:

> Can you post your .config for the test?
> If you have CONFIG_OPTIMIZE_INLINING=y in your -Os test,
> consider re-testing with it turned off.

Yes, I had CONFIG_OPTIMIZE_INLINING=y.

With that turned off, on GCC 4.9.2, I'm seeing:

 fomalhaut:~/linux/linux-____CC_OPTIMIZE_FOR_SIZE=y> size vmlinux.OPTIMIZE_INLINING\=*
     text           data     bss      dec            hex filename
 12150606        2565544 1634304 16350454         f97cf6 vmlinux.OPTIMIZE_INLINING=y
 12354814        2572520 1634304 16561638         fcb5e6 vmlinux.OPTIMIZE_INLINING=n

I.e. forcing the inlining increases the kernel size again, by about 
1.7%.

I re-ran the tests on the Intel system, and got these I$ miss rates:

linux-falign-functions=_64-bytes:                  647,853,942      L1-icache-load-misses                                         ( +-  0.07% )  (100.00%)
linux-falign-functions=_16-bytes:                  706,080,917      L1-icache-load-misses                                         ( +-  0.05% )  (100.00%)
linux-CC_OPTIMIZE_FOR_SIZE=y+OPTIMIZE_INLINING=y:  921,910,808      L1-icache-load-misses                                         ( +-  0.05% )  (100.00%)
linux-CC_OPTIMIZE_FOR_SIZE=y+OPTIMIZE_INLINING=n:  792,395,265      L1-icache-load-misses                                         ( +-  0.05% )  (100.00%)

So yeah, it got better - but the I$ cache miss rate is still 22.4% 
higher than that of the 64-bytes aligned kernel and 12.2% higher than 
the vanilla kernel.

Elapsed time had this original OPTIMIZE_FOR_SIZE result:

       8.531418784 seconds time elapsed                                          ( +-  0.19% )

this now improved to:

       7.686174880 seconds time elapsed                                          ( +-  0.18% )

but it's still much worse than the 64-byte aligned one:

       7.154816369 seconds time elapsed                                          ( +-  0.03% )

and the 16-byte aligned one:

       7.333597250 seconds time elapsed                                          ( +-  0.48% )

> You may be seeing this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

Yeah, disabling OPTIMIZE_INLINING made a difference - but it didn't 
recover the performance loss, -Os is still 4.8% slower in this 
workload than the vanilla kernel.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html