On Sat, 2013-03-23 at 11:11 +0000, Toby Smithe wrote: > Hi, > > I have a fairly free summer coming up, and thought it would be nice to > participate in GSoC. For a while I've been interested in PulseAudio, and > I have an idea for a project. I wonder if you might say whether you > think it plausible. > > I use PulseAudio's native protocol streaming quite a lot, and I've > noticed that it seems quite rudimentary. I read the code a couple of > releases back, and it seems just to stream uncompressed PCM over > TCP. With a wireless connection and multi-channel audio, this quickly > becomes impractical, with drops and latency problems. A while ago, I > looked into implementing Opus compression for the network streams, but > never had a chance. I think Opus would make the ideal codec because it > is very flexible, recently ratified as an Internet standard, and can be > remarkably lightweight (according to the official benchmarks). > > In doing these network audio, I might also be able to move on to > auxiliary tasks like improving the GUI tools for this use-case. > > Do you think this might work? Having means of doing non-PCM streaming would definitely be desirable. That said, though, I'm a bit wary of introducing codec and RTP dependencies in PulseAudio (currently we have this at only one point, which is the Bluetooth modules - something I don't see a way around). Now there are two main concerns: 1. Codecs: choosing a codec is not simple. There are always potential reasons to look at one vs. the other - CPU utilisation, bandwidth, codec latency, quality, specific implementation (libav vs. reference implementation vs. hardware acceleration) and so on. 2. RTP: as our usage gets more complicated, we're going to end up implementing and maintaining a non-trivial RTP stack, which is actually quite hard. Deciding where to draw the line with regards to what does and does not belong in PulseAudio is a bit tricky, but in my mind, encoding/decoding should very much not be in PulseAudio because that beast inevitably gets more complicated as you try to do more, and there are others solving the problem system-wide. RTP, I can see a case for it being in PulseAudio, but it is also complicated, and as with codecs, there are other places in the system where it gets more attention and maintenance. The simplest idea I can think of to deal with this meaningfully is to wrap a sink/source around a GStreamer pipeline to offload all that work that we don't want to duplicate in PulseAudio . On the sink side, we'd hook up to an appsrc to feed PCM data to a pipeline. The pipeline would take care of encoding, RTP packetisation and possibly a corresponding RTCP stream. This would allow codec selection to be flexible, and in the distant future, could even support taking encoded data directly. On the source side, we'd hook up to an appsink, receiving PCM data from the pipeline. The pipeline would take care decoding whatever the format is, take care of RTCP and maybe more advanced features such as a jitter buffer and packet-loss concealment (all of this can be plugged in or not, depending on configuration). Doing it this way means you're using a better RTP stack that gets attention from a number of other use cases (plus assorted related goodies) and support for multiple codecs. Thoughts? -- Arun