Re: [LAU] optimizing jackd build

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 4/9/07, Tim Blechmann <tim@xxxxxxxxxx> wrote:
> Hand written assembler is still many orders faster than what gcc is
> capable of doing. In Ardour peak computation (for both metering and
> waveform displaying) is written in SSE (the first part in pure assembly,
> the second in a C-level abstraction which is almost 1:1 assembly). Both
> functions are more than 20x faster in raw performance than what gcc 4.1
> can do.

btw, is there a reason, why ardour is using assembler code instead of
compiler intrinsics?

Two issues - one of the core concepts of jack et al is the idea of a run time defined samples/period. The compiler has no idea that a typical routine is always called with some multiple of 64 samples and can't unroll well.

Secondly - the compiler intrinsics for SSE1,2,3,4 basically suck. You can, fairly effectively, use the _mm_whatever abstractions, but as soon as you get into type casting you get into a world of hurt and the compiler generates very inefficient code.

beside that, if ardour is using a fixed block size, using compile-time

Would be nice, but not enough hardware can run at low samples/period and there are always situations where you want to run at more.

loop unrolling would be another point, where one could gain speed (iirc,
the micro-benchmarks i did for pnpd/nova indicated an additional
performance boost around 40%) ...

Consistently memory aligning things is an issue on x86.

Since the compiler can't figure it out (and it would be nice if there was some compiler intrinsic that said "this routine is nearly always called with some multiple of 32 bytes) the hand unrolled routines (more every day) basically have to:
normally loop until you have alignment (hopefully just a test and branch)
on some arches, doing loops in 64 byte quantities is a bigger win than 16, so loop with 16 byte quantities until you can do 64
then do 64 byte quantities for a while
then back to 16
then back to 4

It's a pretty easy pattern once you get used to it, but it pays to oprofile first, have the best algorithm second, then... SSE like crazy. :)

tim

--
tim@xxxxxxxxxx    ICQ: 96771783
http://tim.klingt.org

After one look at this planet any visitor from outer space would say
"I want to see the manager."
  William S. Burroughs

_______________________________________________
Linux-audio-user mailing list
Linux-audio-user@xxxxxxxxxxxxxxxxxxxx
http://lists.linuxaudio.org/mailman/listinfo.cgi/linux-audio-user





--
Mike Taht
PostCards From the Bleeding Edge
http://the-edge.blogspot.com
_______________________________________________
Linux-audio-user mailing list
Linux-audio-user@xxxxxxxxxxxxxxxxxxxx
http://lists.linuxaudio.org/mailman/listinfo.cgi/linux-audio-user

[Index of Archives]     [Linux Sound]     [ALSA Users]     [Pulse Audio]     [ALSA Devel]     [Sox Users]     [Linux Media]     [Kernel]     [Photo Sharing]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux