In low latency scenarios, PulseAudio uses up quite a bit of CPU. A while ago I did some profiling and noticed that much of the time was spent inside the ppoll syscall. I couldn't let go of that problem, and I think optimising PulseAudio is a good thing. So I went ahead and did some research, which ended up with a lock-free ringbuffer in shared memory, combined with eventfds for notification. I e, I added a new channel I called "srchannel" in addition to the existing "iochannel" that usually uses UNIX pipes. When running this solution with my low-latency test programs, I ended up with the following result. The tests were done on my core i3 laptop from 2010, and I just ran top and tried to take an approximate average. Reclatencytest: Recording test program. Asks for 10 ms of latency, ends up with a new packet every 5 ms. With iochannel: Pulseaudio main thread - 2.6% CPU Alsa-source thread - 1.7% CPU Reclatencytest - 2.6% CPU Total: 6.9% CPU With srchannel: Pulseaudio main thread - 2.3% CPU Alsa-source thread - 1.7% CPU Reclatencytest - 1.7% CPU Total: 5.3% CPU I e, CPU usage reduced by ~25%. Palatencytest: Playback test program. Asks for 20 ms of latency (I tried 10 ms, but it was too unstable), ends up with a new packet every 8 ms. With iochannel: Pulseaudio main thread - 2.3% CPU Alsa-sink thread - 2.2% CPU Palatencytest - 1.3% CPU Total: 5.8% CPU With srchannel: Pulseaudio main thread - 1.7% CPU Alsa-sink thread - 2.2% CPU Palatencytest - 1.0% CPU Total: 4.9% CPU I e, CPU usage reduced by ~15%. Now, this is not all there is to it. In a future generation of this patch, I'd like to investigate the possibility we can have the client listen to more than one ringbuffer, so we can set up a ringbuffer directly between the I/O-thread and the client, too. That should lead to even bigger savings, and hopefully more stable audio as well (less jitter if we don't pass through the non-RT main thread). As for the implementation, I have a hacky/drafty patch which I'm happy to show to anyone interested. Here's how the patch works: Setup: 1) A client connects and SHM is enabled like usual. (In case SHM cannot be enabled, we can't enable the new srchannel either.) 2) The server allocates a new memblock for the two ringbuffers (one in each direction) and sends this to the client using the iochannel. 3) The server allocates two pa_fdsem objects (these are wrappers around eventfd). 4) The server prepares an additional packet to the client, with a new command PA_COMMAND_ENABLE_RINGBUFFER. 5) The server attaches the eventfds to the packet. Much like we do with pa_creds today, file descriptors can be shared over a socket using the mechanism described e g here [1]. 6) The client receives the memblock and then the packet with the eventfds. 7) Both client and server are now enabling the ringbuffer for all packets from that moment on (assuming they don't need to send additional pa_creds or eventfds, which have to be sent over the iochannel). The shared memblock contains two ringbuffers. There are atomic variables to control the lock-free ringbuffer, so they have to writable by both sides. (As a quick hack, I just enabled both sides to write on all memblocks.) The two ringbuffer objects are encapsulated by an srchannel object, which looks just like the iochannel to the outside world. Writing to an srchannel first writes to the ringbuffer memory, increases the atomic "count" variable, and signals the pa_fdsem. On the reader side that wakes up the reader's pa_fdsem, the ringbuffer's memory is read and "count" is decreased. The pstream object has been modified to be able to read from both an srchannel and an iochannel (in parallel), and writing can go to either channel depending on circumstances. Okay, so this was a fun project and it seems promising. How do you feel I should proceed with it? I expect a response from you, perhaps along some of these lines: 1) Woohoo, this is great! Just make your patches upstreamable and I promise I'll review them right away! 2) Woohoo, this is great! But I don't have any time to review them, so just finish your patches up, and push them without review! 3) This is interesting, but I don't have any time to review them, so put your patches in a drawer for the forseeable future. 4) This is interesting, but some reduced CPU usage in low latency scenarios isn't worth the extra code to maintain. (And the extra 64K per client, for the ringbuffers.) 5) I think the entire idea is bad, because... -- David Henningsson, Canonical Ltd. https://launchpad.net/~diwic [1] http://keithp.com/blogs/fd-passing/