I think that one famous experiment showcasing a small subset of problems is probably not enough to definitively say that we should be packing our structs :) As always, experimentation and profiling on your individual program should be done before overriding the compiler with a micro-optimization. Brian On Fri, Jun 5, 2009 at 1:19 AM, . .<pkejjy@xxxxxxxxx> wrote: > On Fri, Jun 5, 2009 at 12:42 AM, me22<me22.ca@xxxxxxxxx> wrote: >> 2009/6/4 . . <pkejjy@xxxxxxxxx>: >>> If so, that would be a bad performance decision in this >>> multicore world, where memory footprint size matters much more than >>> alignment, generally speaking. >>> >> >> But sharing of cache lines between unrelated things can be even worse. > > Yes, it could be. Certainly it depends on the specific case, but in > general, I'd rather keep a small footprint. In a multicore (and > certainly manycore) system, you have nothing to do all day anyway, > except to think about your own small dataset, on account of a > frequently saturated memory bus. So inefficient L1 data cache > accesses are essentially a nonproblem. > > For the record, the problem is painfully clear in this now-famous experiment: > > http://www.spectrum.ieee.org/computing/hardware/multicore-is-bad-news-for-supercomputers > >> >> And as you said, the point is to make the 64-bit ints properly aligned >> when arrays of the type are used. There are numerous architectures >> which cannot read mis-aligned types, and those that can are usually >> fairly slow at it -- especially when then end up straddling cache >> lines, as an array of 17-byte structs certainly would, at some point. > > Oh OK, so then this behavior must be occurring before GCC realizes > that it's compiling for x86, which doesn't care (except for timing) > about alignment. > >> >> There is, iirc, an __attribute__ that'll let you pack it, if you >> insist. Do you have profiler feedback that says it actually matters? >> > > I don't specifically have profiler feedback, but I'm 100% sure that it > will matter, when the codebase is complete, as this is part of a > large, frequently-hit array which will be shared by lots of cores. It > would be nice to have a command line switch, but I promise I'll shut > up if you can tell me about this __attribute__. If nothing else, your > advice might be Google-able for others who inevitably hit this > problem. Keywords: bad performance large bloated typedef struct array > c gcc size sizeof cache overflow aligned alignment unaligned > inefficient memory usage footprint multicore manycore . > > Thanks for the details. >