>>>> So i am not sure what part of Pulseaudio is causing high CPU Utilization >>>> .. I can tell you that the fail points are mixing, software volume and resampling. > Hmm, that function is not optimized in any way, but if I look on its > sources doesn't appear that slow to me either. For each sample we do > one multiplication, one shifting, we appy saturation and then we > increase/decrease poinetrs with wrap around. That shouldn't be that > bad. Also, this code goes once linearly through all samples, which should > minimize influence of the cache. There is also an array lookup of the channel volume (every for loop cycle), and two increment variables. With an ARM processor this is probably enough extra variables to go past the number of registers and cause stack manipulation. The easiest things would to be to process one channel at a time, incrementing your pointer properly and using the end of the array pointer as a stop point instead of keeping two count variables. I also hope you have optimizations turned on in your compiler or you will get a divide instead of a shift. It definately is possible to run pulseaudio efficently on an ARM processor. Take a look at this for example: http://developer.garmin.com/linux/nuvi-8xx-series/ I've been working on a modified version of pa_mix for my particular arm that should be faster. It basically only works for S16 bits samples and doesn't do 2 channel volume, but here is a little of it. You need to modify pa_render to ignore the streams = 1 case and always use pa_mix, then this is your pa_mix function size_t pa_mix( const pa_mix_info streams[], unsigned nstreams, void *data, size_t length, const pa_sample_spec *spec, const pa_cvolume *volume, int mute) { assert(streams && data && length && spec); #define MAX_STREAMS 8 uint16_t scale_value[MAX_STREAMS]; int16_t* buffer_pointer[MAX_STREAMS]; for(int i = 0; i < nstreams && i < MAX_STREAMS; i++) { buffer_pointer[i] = (int16_t*) ((uint8_t*) streams[i].chunk.memblock->data + streams[i].chunk.index); if(streams[i].chunk.length < length) length = streams[i].chunk.length; /** * Scale linear software volumes to an exponential curve, * approximated here by raising x to the 2nd power * * We divide by 256 here because the lookup table was generated * at that granularity. */ scale_value[i] = (uint16_t)pow2_table[ (int)(streams[i].volume.values[0] / 256) ].v_linear_pow2 ; } /* fastmix takes samples not bytes */ length = length / 2; switch(nstreams) { case 1: fast_mix1_overflow( data, length, buffer_pointer[0], scale_value[0] ); break; case 2: fast_mix2_overflow( data, length, buffer_pointer[0], scale_value[0], buffer_pointer[1], scale_value[1] ); break; case 3: fast_mix3_overflow( data, length, buffer_pointer[0], scale_value[0], buffer_pointer[1], scale_value[1], buffer_pointer[2], scale_value[2] ); break; case 4: fast_mix4_overflow( data, length, buffer_pointer[0], scale_value[0], buffer_pointer[1], scale_value[1], buffer_pointer[2], scale_value[2], buffer_pointer[3], scale_value[3] ); break; case 5: fast_mix5_overflow( data, length, buffer_pointer[0], scale_value[0], buffer_pointer[1], scale_value[1], buffer_pointer[2], scale_value[2], buffer_pointer[3], scale_value[3], buffer_pointer[4], scale_value[4] ); break; case 6: fast_mix6_overflow( data, length, buffer_pointer[0], scale_value[0], buffer_pointer[1], scale_value[1], buffer_pointer[2], scale_value[2], buffer_pointer[3], scale_value[3], buffer_pointer[4], scale_value[4], buffer_pointer[5], scale_value[5] ); break; case 7: fast_mix7_overflow( data, length, buffer_pointer[0], scale_value[0], buffer_pointer[1], scale_value[1], buffer_pointer[2], scale_value[2], buffer_pointer[3], scale_value[3], buffer_pointer[4], scale_value[4], buffer_pointer[5], scale_value[5], buffer_pointer[6], scale_value[6] ); break; case 8: fast_mix8_overflow( data, length, buffer_pointer[0], scale_value[0], buffer_pointer[1], scale_value[1], buffer_pointer[2], scale_value[2], buffer_pointer[3], scale_value[3], buffer_pointer[4], scale_value[4], buffer_pointer[5], scale_value[5], buffer_pointer[6], scale_value[6], buffer_pointer[7], scale_value[7] ); break; default: printf("ERROR!\n"); } /* fastmix takes samples not bytes */ length = length * 2; return length; Then I have these functions for mixing (attached as fastmix.c) -------------- next part -------------- A non-text attachment was scrubbed... Name: fastmix.c Type: text/x-csrc Size: 6049 bytes Desc: not available URL: <http://lists.freedesktop.org/archives/pulseaudio-discuss/attachments/20080730/8b7f39ad/attachment.c>