On Fri, Jun 5, 2009 at 7:50 AM, Brian Budge<brian.budge@xxxxxxxxx> wrote: > I think that one famous experiment showcasing a small subset of > problems is probably not enough to definitively say that we should be > packing our structs :) As always, experimentation and profiling on > your individual program should be done before overriding the compiler > with a micro-optimization. Hi Brian, First, as to your alignment question, it's 20 because I'm compiling to a 32-bit target. So yes, I expect that your result of 24 is due to x86-64. With the packing attribute on, it reverts to 17. Secondly, as to the more general problem, I think the statistics of most realworld usage will bear out the results of this experiment. (In other words, I think the default for x86(-64) should be packed, but I'll leave it to others to have that flame war.) Granted, the experiment pitted an extremely bus-intensive algorithm against a large number of starving cores. But it should be clear that, as the number of cores approaches infinity, the rate of forward progress of any cache-blowing algorithm approaches a linear relationship with the frontside bus frequency, and not the number of cores (i.e. utilization approaches zero). This is why a footprint reduction could ideally result in a concomitant and reciprocal performance improvement. This assumes, of course, that there are multiple cores on a single frontside, as opposed to a network of individual cores; such is the case with most x86 configurations. I can imagine that unless this problem gets solved in the hardware, algorithms running on centicore chips will actually perform better if they use full-blown data compression when talking through the frontside. But yes, in any specific case, I am still in favor of profiling.