On Thursday, April 04, 2013 04:08:43 PM Justin Chudgar wrote: > I had experimentally thrown an optimization into my module's only > significantly warm functions. Since I am a novice, this was a > just-for-kicks experiment, but I would like to know whether to optimize at > all beyond the general "-O2", and what platforms are critical to consider > since I only use pulse on systems that are sufficient to run at "-O0" > without noticeable problems beyond unnecessary power consumption. > > From another thread: > > I'm not sure what to think about the __attribute__((optimize(3))) usage. > > Have you done some benchmarking that shows that the speedup is > > significant compared to the normal -O2? If yes, I guess we can keep > > them. <tanuk> > > I don't know what to think of them either. I did a really simplist benchmark > with the algorithm on my core i3 laptop initially to determine if it was > useful to keep everything double or float. There was no benefit to reducing > presicion on this one system, but that attribute was dramatic. Did not try > O2, though, just 03 and O0. I thought about messing with vectorization, but > I only have x86-64 PCs and that seems most valuable for embedded devices > which I cannot test at the moment. > > 11: Determine optimization strategy for filter code. > http://github.com/justinzane/pulseaudio/issues/issue/11 > > > _______________________________________________ > pulseaudio-discuss mailing list > pulseaudio-discuss at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss Just some very simplistic benchmark results of "__attribute__((optimize(#))) function()" in code similar to a biquad filter: optimize(0), 1867570825, 27.828974 optimize(1), 1017762024, 15.165836 optimize(2), 951896198, 14.184359 optimize(3), 952574300, 14.194463 This is for "memchunk" analogs of single channel 2^16 doubles being filtered and averaged over 2^10 runs with forced cpu affinity. The benchmark itself was compiled with -O0. With the supporting code compiled -O2, the numbers are: optimize(0), 1436955156, 21.412300 optimize(1), 1020384309, 15.204911 optimize(2), 952980992, 14.200523 optimize(3), 952473365, 14.192959 Not much difference there. With the benchmark compiled -O3, there is a DRASTIC change: optimize(0), 1442046736, 21.488171 optimize(1), 1017924249, 15.168253 optimize(2), 954029138, 14.216142 optimize(3), 374432, 0.005579 That was such a freakish improvement, that I ran it several times, but the results are quite reliable on my dev system. Replacing the optimize(#) with hot and using -O3 for the whole gives: hot, 310780, 0.004631 And removing the __attribute__ altogether, again using -O3 for the whole gives: <NONE>, 333013, 0.004962 Being generally a novice using a VERY simplistic wrapper of a rather simple function, I'm loathe to draw too many conclusions. However, this suggests that it might be worth using __attribute__(hot) for any serious number crunching functions within pulse and adopting the -O3 compiler flags as the standard. If I can figure out oprofile or something similar, I'll try to test. I'd also like to hear general feedback about this since I'm just learning. Thanks, all. Justin