preliminary benchmarking on Intel i5-2400S, 64-bit, Linux 3.13: running 'paplay --latency-msec=10 stereo_48KHz.wav', output on internal soundcard (Intel HDA), measuring the maximum CPU% in top for the pulseaudio and paplay code flags PA paplay master 6d1fd4d1 -O2 < 14.0% < 3.7% master 6d1fd4d1 -O2 -DNDEBUG < 13.3% < 3.3% proposed v3 -O2 < 8.3% < 1.3% proposed v3 -O2 -DNDEBUG < 7.6% < 1.3% ARMv7 benchmarking soonish this patch series aims to save memory allocations and some system calls related to PA's client/server protocol implementation v3 adds inlining and saves a snd_pcm_avail(), v2 code is largely unchanged (minibuffers are increased and better used) patches 1 to 5 ('tagstruct:') introduce a new tagstruct type _APPENDED which can hold tagstruct data up to a certain size; tagstructs are now kept in a specific free-list -- this typically replaces two malloc()/free()s with one flist push()/pop() patches 6 to 9 ('packet:') make packets fixed-size (typically); packets are kept in a specific free-list -- this replaces one malloc()/free() with one flist push()/pop() patches 10 to 14 ('pstream:') allows to send tagstructs directly to a pstream without encapsulation in a packet -- this saves one flist push()/pop() patches 15 and 16 ('pstream') often save a read() call by reading more than just the descriptor (up to 40 bytes, e.g. description (20 bytes) + shm info (16 bytes)); the idea is similar to b4342845d, "Optimize write of smaller packages", but for read -- this trades some extra memcpy() for a read(); in v3 the buffer size has been increased to 256 bytes patch 17 ('iochannel') fixes a strange behaviour in iochannel/mainloop that deleted the input_event with every read which caused a rebuild of the pollfds for every read()! patches 18 to 20 ('queue', 'pstream') aim to combine two (v3: or more) write items into one minibuffer by peeking ahead in the send queue patch 21 stop calling mainloop's defer_enable() after queuing a SHMRELEASE; this increases the chance that items can be combined (i.e. by patch 20) patch 22 inlines pa_run_once() as this function came out high in profiling patches 23 and 24 ('rtpoll') are cleanup patch 25 ('mainloop') only clears the wakeup pipe when poll() indicates that the pipe is readable; if the only ready file descriptor is the wakeup pipe, searching io_events can be avoided patch 26 and 27 ('flish') removes the volatile annotation and makes flist_elem attributes non-atomic -- needed? v3 material: patches 28 to 31 annotates some branches in and saves two rtclock() calls patch 32 ('resampler') is cleanup patch 33 ('build-sys') adds --disable-statistics to configure patches 34 to 37 make several hot functions inlinable; API function in pulse/ do lot of error checking which is unnecessary in the core; worse, checking does NOT go away with NDEBUG patch 38 ('resampler') precomputes the maximum block size in frames patches 39 to 42 ('mix) makes functions inlineable and cleanup patches 43 and 44 makes volume-related function inlineable patch 45 and 46 ('iochannel', 'asyncmsgq') drop dead code patch 47 fixes sink_input_pop_cb() to return the entire memchunk (as per specification) patch 48 saves one call to snd_pcm_avail() by computing left_to_play -- this patch has probably THE BIGGEST impact patches 49 to 51 are cleanup and refactoring summary: with these patches typical playback (i.e. after setup) runs without any malloc()/free() thanks to the use of free-lists; the number of memory management operations is reduced many hot function have been made inlineable, redundant checks can be dropped by compiling with NDEBUG=1 read() and write() syscalls are saved by combining data into minibuffers one call to snd_pcm_avail() is saved per mmap_write() Peter Meerwald (51): tagstruct: Distinguish pa_tagstruct_new() use cases tagstruct: Replace dynamic flag with type tagstruct: Get rid of pa_tagstruct_free_data() tagstruct: Add type _APPENDED tagstruct: Use flist to potentially save calls to malloc()/free() packet: Hide internals of pa_packet, introduce pa_packet_data() packet: Make pa_packet_new() create fixed-size packets packet: Introduce pa_packet_new_data() to copy data into a newly created packet packet: Use flist to save calls to malloc()/free() pstream: Unionize item_info pstream: Add pa_pstream_send_tagstruct() pstream: #define PA_PSTREAM_SHM_SIZE pstream: Duplicate assignment, write.data is always NULL pstream: Only reset memchunk if it has been used pstream: Split up do_read() pstream: Use small minibuffer to combine several read()s if possible iochannel: Fix channel enable queue: Add pa_queue_peek() function pstream: Add helper functions reset_descriptor(), shm_descriptor() pstream: Peek into next item on send queue to see if it can be put into minibuffer together with current item pstream: Don't call defer_enable() on SHMRELEASE once: Inline functions rtpoll: Fix condition for DEBUG_TIMING output rtpoll: Drop extra wait_op argument to pa_rtpoll_run() mainloop: Clear wakeup pipe only when necessary flist: Don't use atomic operations to manipulate ptr, next flist: Don't make flist volatile rtpoll: Annotate branches with LIKELY mainloop: Annotate branches with LIKELY alsa: Make rtpoll_run() runtime measurement compile-time code, default off alsa: Annotate branches in ALSA sink/source thread_func() with LIKELY resampler: Drop pointless remix variable build-sys: Add --disable-statistics sample: Make pa_sample_size_table public sample: Make pa_channels_valid() inlineable sample-util: Add inlineable functions core: Make use of use inlineable macros resampler: Precompute maximum block size in frames mix: Make use of pa_cvolume_is_norm/muted() macros mix: Avoid redundant cvolume checks mix: pa_mix() is always called with more than one steam mix: Length over all chunk has already been computed by the caller core: Add volume-util.h core: Make use of volume macros iochannel: Remove unnecessary zero-initialization asyncmsgq: Drop weird assert protocol-native: Make sink_input_pop_cb() return entire chunk alsa-sink: Assume left_to_play can be computed, save one call to snd_pcm_avail() alsa: Refactor computation of sleep usec alsa: Precompute max_frames alsa: Remove redundant sample_spec parameter to reset_watermark() function configure.ac | 13 +- src/modules/alsa/alsa-mixer.c | 4 +- src/modules/alsa/alsa-sink.c | 187 +++---- src/modules/alsa/alsa-source.c | 135 ++--- src/modules/alsa/alsa-util.c | 32 +- src/modules/bluetooth/module-bluez4-device.c | 2 +- src/modules/bluetooth/module-bluez5-device.c | 2 +- src/modules/echo-cancel/module-echo-cancel.c | 42 +- src/modules/echo-cancel/webrtc.cc | 10 +- src/modules/module-card-restore.c | 4 +- src/modules/module-combine-sink.c | 2 +- src/modules/module-device-manager.c | 12 +- src/modules/module-device-restore.c | 16 +- src/modules/module-esound-sink.c | 2 +- src/modules/module-null-sink.c | 2 +- src/modules/module-null-source.c | 2 +- src/modules/module-pipe-sink.c | 2 +- src/modules/module-pipe-source.c | 2 +- src/modules/module-sine-source.c | 2 +- src/modules/module-stream-restore.c | 12 +- src/modules/module-tunnel.c | 54 +- src/modules/oss/module-oss.c | 2 +- src/modules/raop/module-raop-sink.c | 2 +- src/pulse/context.c | 29 +- src/pulse/ext-device-manager.c | 14 +- src/pulse/ext-device-restore.c | 10 +- src/pulse/ext-stream-restore.c | 10 +- src/pulse/introspect.c | 82 +-- src/pulse/mainloop.c | 70 +-- src/pulse/sample.c | 18 +- src/pulse/sample.h | 4 +- src/pulse/scache.c | 10 +- src/pulse/stream.c | 43 +- src/pulse/subscribe.c | 2 +- src/pulsecore/asyncmsgq.c | 2 - src/pulsecore/flist.c | 14 +- src/pulsecore/flist.h | 2 +- src/pulsecore/iochannel.c | 37 +- src/pulsecore/memblock.c | 15 + src/pulsecore/memblockq.c | 5 +- src/pulsecore/mix.c | 42 +- src/pulsecore/mix.h | 5 + src/pulsecore/once.c | 18 +- src/pulsecore/once.h | 25 +- src/pulsecore/packet.c | 55 +- src/pulsecore/packet.h | 20 +- src/pulsecore/pdispatch.c | 9 +- src/pulsecore/protocol-native.c | 162 +++--- src/pulsecore/pstream-util.c | 33 +- src/pulsecore/pstream-util.h | 2 - src/pulsecore/pstream.c | 734 +++++++++++++++++---------- src/pulsecore/pstream.h | 2 + src/pulsecore/queue.c | 11 + src/pulsecore/queue.h | 3 + src/pulsecore/resampler.c | 45 +- src/pulsecore/resampler.h | 3 +- src/pulsecore/rtpoll.c | 46 +- src/pulsecore/rtpoll.h | 5 +- src/pulsecore/sample-util.c | 8 +- src/pulsecore/sample-util.h | 53 ++ src/pulsecore/sink-input.c | 13 +- src/pulsecore/sink.c | 23 +- src/pulsecore/source-output.c | 9 +- src/pulsecore/source.c | 13 +- src/pulsecore/tagstruct.c | 67 ++- src/pulsecore/tagstruct.h | 4 +- src/pulsecore/volume-util.h | 92 ++++ src/tests/rtpoll-test.c | 4 +- src/tests/srbchannel-test.c | 21 +- 69 files changed, 1455 insertions(+), 982 deletions(-) create mode 100644 src/pulsecore/volume-util.h -- 1.9.1