[PATCH RFCv3 00/51] optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



preliminary benchmarking on Intel i5-2400S, 64-bit, Linux 3.13:

running 'paplay --latency-msec=10 stereo_48KHz.wav', output on internal
soundcard (Intel HDA), measuring the maximum CPU% in top for the pulseaudio
and paplay

code             flags          PA       paplay
master 6d1fd4d1  -O2            < 14.0%  < 3.7%
master 6d1fd4d1  -O2 -DNDEBUG   < 13.3%  < 3.3%
proposed v3      -O2             < 8.3%  < 1.3%
proposed v3      -O2 -DNDEBUG    < 7.6%  < 1.3%

ARMv7 benchmarking soonish


this patch series aims to save memory allocations and some system calls
related to PA's client/server protocol implementation

v3 adds inlining and saves a snd_pcm_avail(), v2 code is largely unchanged
(minibuffers are increased and better used)


patches 1 to 5 ('tagstruct:') introduce a new tagstruct type _APPENDED
which can hold tagstruct data up to a certain size; tagstructs are now 
kept in a specific free-list -- this typically replaces two malloc()/free()s
with one flist push()/pop()

patches 6 to 9 ('packet:') make packets fixed-size (typically); packets are
kept in a specific free-list -- this replaces one malloc()/free() with one
flist push()/pop()

patches 10 to 14 ('pstream:') allows to send tagstructs directly to a pstream
without encapsulation in a packet -- this saves one flist push()/pop()

patches 15 and 16 ('pstream') often save a read() call by reading more than
just the descriptor (up to 40 bytes, e.g. description (20 bytes) + shm
info (16 bytes)); the idea is similar to b4342845d, "Optimize write 
of smaller packages", but for read -- this trades some extra memcpy() for
a read(); in v3 the buffer size has been increased to 256 bytes

patch 17 ('iochannel') fixes a strange behaviour in iochannel/mainloop that
deleted the input_event with every read which caused a rebuild of the pollfds
for every read()!

patches 18 to 20 ('queue', 'pstream') aim to combine two (v3: or more) write items 
into one minibuffer by peeking ahead in the send queue

patch 21 stop calling mainloop's defer_enable() after queuing a SHMRELEASE; this
increases the chance that items can be combined (i.e. by patch 20)

patch 22 inlines pa_run_once() as this function came out high in profiling

patches 23 and 24 ('rtpoll') are cleanup

patch 25 ('mainloop') only clears the wakeup pipe when poll() indicates that
the pipe is readable; if the only ready file descriptor is the wakeup pipe,
searching io_events can be avoided

patch 26 and 27 ('flish') removes the volatile annotation and makes flist_elem attributes
non-atomic -- needed?

v3 material:

patches 28 to 31 annotates some branches in and saves two rtclock() calls

patch 32 ('resampler') is cleanup

patch 33 ('build-sys') adds --disable-statistics to configure

patches 34 to 37 make several hot functions inlinable; API function in pulse/
do lot of error checking which is unnecessary in the core; worse, checking does NOT
go away with NDEBUG

patch 38 ('resampler') precomputes the maximum block size in frames

patches 39 to 42 ('mix) makes functions inlineable and cleanup

patches 43 and 44 makes volume-related function inlineable

patch 45 and 46 ('iochannel', 'asyncmsgq') drop dead code

patch 47 fixes sink_input_pop_cb() to return the entire memchunk (as per specification)

patch 48 saves one call to snd_pcm_avail() by computing left_to_play -- this patch
has probably THE BIGGEST impact

patches 49 to 51 are cleanup and refactoring


summary:

with these patches typical playback (i.e. after setup) runs without any malloc()/free()
thanks to the use of free-lists; the number of memory management operations is reduced

many hot function have been made inlineable, redundant checks can be dropped by
compiling with NDEBUG=1

read() and write() syscalls are saved by combining data into minibuffers

one call to snd_pcm_avail() is saved per mmap_write()


Peter Meerwald (51):
  tagstruct: Distinguish pa_tagstruct_new() use cases
  tagstruct: Replace dynamic flag with type
  tagstruct: Get rid of pa_tagstruct_free_data()
  tagstruct: Add type _APPENDED
  tagstruct: Use flist to potentially save calls to malloc()/free()
  packet: Hide internals of pa_packet, introduce pa_packet_data()
  packet: Make pa_packet_new() create fixed-size packets
  packet: Introduce pa_packet_new_data() to copy data into a newly
    created packet
  packet: Use flist to save calls to malloc()/free()
  pstream: Unionize item_info
  pstream: Add pa_pstream_send_tagstruct()
  pstream: #define PA_PSTREAM_SHM_SIZE
  pstream: Duplicate assignment, write.data is always NULL
  pstream: Only reset memchunk if it has been used
  pstream: Split up do_read()
  pstream: Use small minibuffer to combine several read()s if possible
  iochannel: Fix channel enable
  queue: Add pa_queue_peek() function
  pstream: Add helper functions reset_descriptor(), shm_descriptor()
  pstream: Peek into next item on send queue to see if it can be put
    into minibuffer together with current item
  pstream: Don't call defer_enable() on SHMRELEASE
  once: Inline functions
  rtpoll: Fix condition for DEBUG_TIMING output
  rtpoll: Drop extra wait_op argument to pa_rtpoll_run()
  mainloop: Clear wakeup pipe only when necessary
  flist: Don't use atomic operations to manipulate ptr, next
  flist: Don't make flist volatile
  rtpoll: Annotate branches with LIKELY
  mainloop: Annotate branches with LIKELY
  alsa: Make rtpoll_run() runtime measurement compile-time code, default
    off
  alsa: Annotate branches in ALSA sink/source thread_func() with LIKELY
  resampler: Drop pointless remix variable
  build-sys: Add --disable-statistics
  sample: Make pa_sample_size_table public
  sample: Make pa_channels_valid() inlineable
  sample-util: Add inlineable functions
  core: Make use of use inlineable macros
  resampler: Precompute maximum block size in frames
  mix: Make use of pa_cvolume_is_norm/muted() macros
  mix: Avoid redundant cvolume checks
  mix: pa_mix() is always called with more than one steam
  mix: Length over all chunk has already been computed by the caller
  core: Add volume-util.h
  core: Make use of volume macros
  iochannel: Remove unnecessary zero-initialization
  asyncmsgq: Drop weird assert
  protocol-native: Make sink_input_pop_cb() return entire chunk
  alsa-sink: Assume left_to_play can be computed, save one call to
    snd_pcm_avail()
  alsa: Refactor computation of sleep usec
  alsa: Precompute max_frames
  alsa: Remove redundant sample_spec parameter to reset_watermark()
    function

 configure.ac                                 |  13 +-
 src/modules/alsa/alsa-mixer.c                |   4 +-
 src/modules/alsa/alsa-sink.c                 | 187 +++----
 src/modules/alsa/alsa-source.c               | 135 ++---
 src/modules/alsa/alsa-util.c                 |  32 +-
 src/modules/bluetooth/module-bluez4-device.c |   2 +-
 src/modules/bluetooth/module-bluez5-device.c |   2 +-
 src/modules/echo-cancel/module-echo-cancel.c |  42 +-
 src/modules/echo-cancel/webrtc.cc            |  10 +-
 src/modules/module-card-restore.c            |   4 +-
 src/modules/module-combine-sink.c            |   2 +-
 src/modules/module-device-manager.c          |  12 +-
 src/modules/module-device-restore.c          |  16 +-
 src/modules/module-esound-sink.c             |   2 +-
 src/modules/module-null-sink.c               |   2 +-
 src/modules/module-null-source.c             |   2 +-
 src/modules/module-pipe-sink.c               |   2 +-
 src/modules/module-pipe-source.c             |   2 +-
 src/modules/module-sine-source.c             |   2 +-
 src/modules/module-stream-restore.c          |  12 +-
 src/modules/module-tunnel.c                  |  54 +-
 src/modules/oss/module-oss.c                 |   2 +-
 src/modules/raop/module-raop-sink.c          |   2 +-
 src/pulse/context.c                          |  29 +-
 src/pulse/ext-device-manager.c               |  14 +-
 src/pulse/ext-device-restore.c               |  10 +-
 src/pulse/ext-stream-restore.c               |  10 +-
 src/pulse/introspect.c                       |  82 +--
 src/pulse/mainloop.c                         |  70 +--
 src/pulse/sample.c                           |  18 +-
 src/pulse/sample.h                           |   4 +-
 src/pulse/scache.c                           |  10 +-
 src/pulse/stream.c                           |  43 +-
 src/pulse/subscribe.c                        |   2 +-
 src/pulsecore/asyncmsgq.c                    |   2 -
 src/pulsecore/flist.c                        |  14 +-
 src/pulsecore/flist.h                        |   2 +-
 src/pulsecore/iochannel.c                    |  37 +-
 src/pulsecore/memblock.c                     |  15 +
 src/pulsecore/memblockq.c                    |   5 +-
 src/pulsecore/mix.c                          |  42 +-
 src/pulsecore/mix.h                          |   5 +
 src/pulsecore/once.c                         |  18 +-
 src/pulsecore/once.h                         |  25 +-
 src/pulsecore/packet.c                       |  55 +-
 src/pulsecore/packet.h                       |  20 +-
 src/pulsecore/pdispatch.c                    |   9 +-
 src/pulsecore/protocol-native.c              | 162 +++---
 src/pulsecore/pstream-util.c                 |  33 +-
 src/pulsecore/pstream-util.h                 |   2 -
 src/pulsecore/pstream.c                      | 734 +++++++++++++++++----------
 src/pulsecore/pstream.h                      |   2 +
 src/pulsecore/queue.c                        |  11 +
 src/pulsecore/queue.h                        |   3 +
 src/pulsecore/resampler.c                    |  45 +-
 src/pulsecore/resampler.h                    |   3 +-
 src/pulsecore/rtpoll.c                       |  46 +-
 src/pulsecore/rtpoll.h                       |   5 +-
 src/pulsecore/sample-util.c                  |   8 +-
 src/pulsecore/sample-util.h                  |  53 ++
 src/pulsecore/sink-input.c                   |  13 +-
 src/pulsecore/sink.c                         |  23 +-
 src/pulsecore/source-output.c                |   9 +-
 src/pulsecore/source.c                       |  13 +-
 src/pulsecore/tagstruct.c                    |  67 ++-
 src/pulsecore/tagstruct.h                    |   4 +-
 src/pulsecore/volume-util.h                  |  92 ++++
 src/tests/rtpoll-test.c                      |   4 +-
 src/tests/srbchannel-test.c                  |  21 +-
 69 files changed, 1455 insertions(+), 982 deletions(-)
 create mode 100644 src/pulsecore/volume-util.h

-- 
1.9.1



[Index of Archives]     [Linux Audio Users]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux