Re: [PATCH/RFC] SIMD optimizations for SBC encoder analysis filter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Siarhei,

> > > This is a preliminary preview of SIMD optimizations for SBC encoder
> > > analysis filter.
> > >
> > > It already contains MMX optimization for 4 subbands case (yes, all this
> > > insane amount of extra lines of code finally starts to pay off) ;)
> > >
> > > Important notice: in order to test MMX optimizations, you need to have
> > > extra '-mmmx' command line option passed to gcc. Runtime MMX
> > > autodetection can be easily added later. Also don't forget to pass -s4
> > > option to sbcenc because 8 subbands case is still not accelerated. By the
> > > way, SSE2 is twice wider than MMX and should be a lot faster. Though MMX
> > > is supported on virtually every x86 cpu that is in use nowadays and can
> > > be considered "lowest common denominator".
> > >
> > > My quick benchmark showed that the performance gets improved about ~10%
> > > overall (and about twice better for the analysis filter function alone)
> > > when compared with bluez-4.23 release which had the old buggy code.
> > > Improvement is much more noticeable over the release 4.25 which contains
> > > a new fixed and mostly nonoptimized filter.
> > >
> > > So now the performance is better than ever. And I guess, all the
> > > platforms should use SIMD optimizations nowadays, so they should gain
> > > performance improvements too. Those 'anamatrix' style optimizations in
> > > older code feel so much like the previous century ;)
> > >
> > > I'm going to primarily focus on NEON and maybe ARMv6 SIMD optimizations,
> > > these will be submitted a bit later. Also, as I have already written
> > > before, the other parts of code are quite inefficient too and can be
> > > optimized. There are still lots of things to improve.
> > >
> > >
> > > But right now I would like to hear some opinions about the following
> > > things regarding the attached patch:
> > >
> > > The first question is about the use of extra source file for SIMD
> > > optimizations and introduction of
> > > 'sbc_encoder_init_simd_optimized_analyze' function to the global
> > > namespace. The rationale for that is the intention to stop adding changes
> > > to 'sbc.c' (otherwise it will become bloated pretty soon with the
> > > addition of multiple optimizations for various platforms). If anyone has
> > > a better idea, I'm very much interested to hear it.
> > >
> > > And if the addition of a new source file gets approved, I wonder about
> > > what text should go to the copyright header?
> > >
> > > Now we have two "reference" C implementations of analysis filter. Is it
> > > OK to keep both? Or only SIMD-friendly one should remain in the end?
> >
> > I am fine with keeping both, but if one is just not useful, we are going
> > to remove it.
> 
> The only problem with SIMD-friendly code is that it uses two tables instead of
> one (that's a sacrifice for the nice and symmetric code layout which fits SIMD
> instructions of modern processors quite well). It may be somewhat less
> optimal for the legacy processors without SIMD capabilities.
> 
> I wonder what CPU architectures are the most important for bluez?
> 
> > Also two separate files are fine for me. Personally I prefer a runtime
> > selection since compile time options are always painful 
> > to test before making the release.
> 
> The attached patch contains what I would consider to be a final variant. MMX
> support is now complete. It works for both x86 and amd64, has runtime
> autodetection of MMX availability, supports 4 and 8 subbands cases. I also
> ensured that only original MMX instructions are used (and no SSE or other
> later additions), so the code should work fine even on the old Pentium1 MMX.
> New MMX optimized functions produce bit identical results when compared
> with bluez-4.25 release.
> 
> With this patch applied, new filtering functions are noticeably faster than
> than the old ones on x86 (so now they are both faster and have better
> quality). Assembly optimizations for the other platforms can be easily
> added too. 

can you re-base your patch against the latest tree and re-send the
patch.

Do we still need the high precession stuff. I wanna cut down the number
of ifdefs in the code as much as possible.

Regards

Marcel


--
To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux