On 6/11/2017 7:06 PM, Toby Douglass wrote:
On Sun, Jun 11, 2017 at 10:18 PM, tim prince via gcc-help
<gcc-help@xxxxxxxxxxx> wrote:
Is there a way to find out the maximum alignment supported by the iinker?
You may need to look at the binutils source (if your gcc uses gnu ld),
where the maximum supported alignment is defined.
Not documented as such - source code time. That's okay.
32-byte alignment needs
to be supported for x86-64 platforms beginning with Intel Nehalem; 64-byte
alignment for those which support AVX512. So it is a poor quality build of
binutils which doesn't meet those requirements. Not many years ago, the
binutils default on windows was less than 16-byte alignment and so it was
useful to build your own copy with this corrected, but there may be a limit
to what the platform can support. Surely, you can run objdump on your
executables and see what alignments you get in those spots where you set a
request. Agreed that 128-byte alignment on x86 might avoid some situations
where the prefetcher brings in a useless adjacent cache line, but the cache
line size is 64 bytes.
I read just a few days ago that Intel x86 (not x86_64) cache line
lengths varied by processor from 32 to 128 bytes. AFAIK, x86_64 is
always 64 bytes, but I have no strong confirmation of that. I have
read Intel these days in hardware (sometimes something you can switch
off in the BIOS) always brings over two cache lines at a time, so if
you're trying to isolate a variable from other activity you have to
treat the cache linn size as if it's 128 bytes.
Thanks for your help, Tim!
Running in 32-bit mode (should be unusual nowadays) doesn't change cache
line size or BIOS settings.
Turning off adjacent cache line prefetch has been recommended for years
when running data base or similar server applications. It could easily
be the right thing if your application requires one thread to read
within 128 bytes of where another writes, a potential false sharing
case, as well as the data base situation where the 2nd cache line is
unlikely to be used. You're correct that client BIOS typically doesn't
offer a choice, and that the effect is much as if the cache line length
for reads were extended. Also, past CPUs fetched cache lines in a
variable sequence of 16 byte chunks (but always ended up with 64 bytes).