Re: Bloated Struct Problem

". ." <pkejjy@xxxxxxxxx> · Fri, 5 Jun 2009 01:19:26 -0700

On Fri, Jun 5, 2009 at 12:42 AM, me22<me22.ca@xxxxxxxxx> wrote:
> 2009/6/4 . . <pkejjy@xxxxxxxxx>:
>> If so, that would be a bad performance decision in this
>> multicore world, where memory footprint size matters much more than
>> alignment, generally speaking.
>>
>
> But sharing of cache lines between unrelated things can be even worse.

Yes, it could be.  Certainly it depends on the specific case, but in
general, I'd rather keep a small footprint.  In a multicore (and
certainly manycore) system, you have nothing to do all day anyway,
except to think about your own small dataset, on account of a
frequently saturated memory bus.  So inefficient L1 data cache
accesses are essentially a nonproblem.

For the record, the problem is painfully clear in this now-famous experiment:

http://www.spectrum.ieee.org/computing/hardware/multicore-is-bad-news-for-supercomputers

>
> And as you said, the point is to make the 64-bit ints properly aligned
> when arrays of the type are used.  There are numerous architectures
> which cannot read mis-aligned types, and those that can are usually
> fairly slow at it -- especially when then end up straddling cache
> lines, as an array of 17-byte structs certainly would, at some point.

Oh OK, so then this behavior must be occurring before GCC realizes
that it's compiling for x86, which doesn't care (except for timing)
about alignment.

>
> There is, iirc, an __attribute__ that'll let you pack it, if you
> insist.  Do you have profiler feedback that says it actually matters?
>

I don't specifically have profiler feedback, but I'm 100% sure that it
will matter, when the codebase is complete, as this is part of a
large, frequently-hit array which will be shared by lots of cores.  It
would be nice to have a command line switch, but I promise I'll shut
up if you can tell me about this __attribute__.  If nothing else, your
advice might be Google-able for others who inevitably hit this
problem.  Keywords: bad performance large bloated typedef struct array
c gcc size sizeof cache overflow aligned alignment unaligned
inefficient memory usage footprint multicore manycore .

Thanks for the details.