Am 08.04.2013 21:02, schrieb Justin Chudgar: > On Thursday, April 04, 2013 04:08:43 PM Justin Chudgar wrote: >> I had experimentally thrown an optimization into my module's only >> significantly warm functions. Since I am a novice, this was a >> just-for-kicks experiment, but I would like to know whether to optimize at >> all beyond the general "-O2", and what platforms are critical to consider >> since I only use pulse on systems that are sufficient to run at "-O0" >> without noticeable problems beyond unnecessary power consumption. >> >> From another thread: >>> I'm not sure what to think about the __attribute__((optimize(3))) usage. >>> Have you done some benchmarking that shows that the speedup is >>> significant compared to the normal -O2? If yes, I guess we can keep >>> them. <tanuk> >> I don't know what to think of them either. I did a really simplist benchmark >> with the algorithm on my core i3 laptop initially to determine if it was >> useful to keep everything double or float. There was no benefit to reducing >> presicion on this one system, but that attribute was dramatic. Did not try >> O2, though, just 03 and O0. I thought about messing with vectorization, but >> I only have x86-64 PCs and that seems most valuable for embedded devices >> which I cannot test at the moment. >> >> 11: Determine optimization strategy for filter code. >> http://github.com/justinzane/pulseaudio/issues/issue/11 >> >> >> _______________________________________________ >> pulseaudio-discuss mailing list >> pulseaudio-discuss at lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss > Just some very simplistic benchmark results of > "__attribute__((optimize(#))) function()" > in code similar to a biquad filter: > optimize(0), 1867570825, 27.828974 > optimize(1), 1017762024, 15.165836 > optimize(2), 951896198, 14.184359 > optimize(3), 952574300, 14.194463 > This is for "memchunk" analogs of single channel 2^16 doubles being filtered > and averaged over 2^10 runs with forced cpu affinity. The benchmark itself was > compiled with -O0. > > With the supporting code compiled -O2, the numbers are: > optimize(0), 1436955156, 21.412300 > optimize(1), 1020384309, 15.204911 > optimize(2), 952980992, 14.200523 > optimize(3), 952473365, 14.192959 > Not much difference there. > > With the benchmark compiled -O3, there is a DRASTIC change: > optimize(0), 1442046736, 21.488171 > optimize(1), 1017924249, 15.168253 > optimize(2), 954029138, 14.216142 > optimize(3), 374432, 0.005579 > That was such a freakish improvement, that I ran it several times, but the > results are quite reliable on my dev system. > This seems wrong. Does the code still execute *correctly*, does it even run the benchmark at all at -O3? I suspect -O3 optimized large sections of code away which may (or may not) produce incorrect code, perhaps because because the benchmark code relies on undefined behavior or a bug in gcc. Best regards.