On Aug 10, 2009, at 9:59 AM, Sebastian Dr?ge wrote: > Am Montag, den 10.08.2009, 09:41 -0500 schrieb Rob Clark: >> 1) convert default processing functions to __attribute__((weak)) so >> they can be overrided with >> architecture specific accelerated functions (ie. NEON, MMX, >> Altivec, etc) >> 2) override gst_audio_quantize_quantize_signed_tpdf_none() to use >> NEON vector instructions >> 3) override gst_audio_convert_unpack_float_le() to use NEON vector >> instructions >> >> This speeds up audioconvert ~10x, at least for the 32b float -> 16b >> int conversion needed to play >> AC-3 audio (ie. DVD's) via ALSA > > Hi, > first of all, could you file a bug for this and attach the bug > there? :) [RC] Hi Sebastian, I just wanted to send patch here, because it might be interesting to others working on ARM (armv7) based processors. liboil / orc based solution is probably better long term solution, although I'm not sure of the current state of liboil / orc on armv7. That, and I wanted an excuse to teach myself about NEON ;-) So I don't know if you want to integrate this patch as-is, which is why I didn't create an issue in bugzilla yet. I guess my next side- project is to learn a bit more about liboil / orc. > and then some comments on the patch itself: > - Don't use __atribute__(weak), it's not portable. Instead use > liboil to > detect at runtime if the CPU supports a specific instruction set and > then use the appropiate function pointer to the unpack/quantize > function [RC] oh, darn.. it was such a clever trick too.. > - Add a configure check to see if the compiler supports the specific > instruction set and only compile that ARMv7 code then [RC] I did put the whole file within a '#ifdef __ARM_NEON__ / #endif'.. which should also work even if the compiler supports NEON but user doesn't give '-mfpu=neon'. But I admit that my configure- foo is weak, so there is certainly a better way to do this. > - The start of a buffer might not be 16 byte aligned or what alignment > is required by VFP. It's only guaranteed to be aligned to the sample > type, i.e. 2 byte aligned for 16 bit samples, etc > [RC] AFAIK, VLDR/VSTR doesn't require 128bit alignment, although the cycle count is lower for aligned accesses. So I guess it could be made a bit faster by handling alignment a little better. As-is, it is a night and day difference and the gstaudioconvert related functions only show up a couple pages down in oprofile output. Now it is liba52 that needs some optimization ;-) > In general this patch is a good idea though, something like this > really > needs to go into audioconvert at critical places for other > architectures > too. > > FYI, David Schleef has partially converted audioconvert to use orc[0]. > Together with the orc VFP backend this would obsolete your patch I > guess. > > [0] http://cgit.freedesktop.org/~ds/gst-plugins-base/log/?h=orc > <signature.asc><ATT00001.txt><ATT00002.txt> [RC] ok, I'll check out his patch.. that is almost certainly the better long term approach. I just didn't know what was the current state of ORC for NEON/VFP.. BR, -R