GSoC: Call for project ideas

tsmithe@xxxxxxxxxx (Toby St Clere Smithe) · Mon, 25 Mar 2013 16:19:32 +0000

Hi Arun,

Thanks for your message. I'm going to reply to you first, and then reply
to David.

Arun Raghavan <arun.raghavan at collabora.co.uk> writes:
> Having means of doing non-PCM streaming would definitely be desirable.
> That said, though, I'm a bit wary of introducing codec and RTP
> dependencies in  PulseAudio (currently we have this at only one point,
> which is the Bluetooth modules - something I don't see a way around).
>
> Now there are two main concerns:
>
> 1. Codecs: choosing a codec is not simple. There are always potential
> reasons to look at one vs. the other - CPU utilisation, bandwidth, codec
> latency, quality, specific implementation (libav vs. reference
> implementation vs. hardware acceleration) and so on.

Indeed. I think a codec-agnostic implementation is important, as I
mentioned in an earlier message. I chose Opus for the same reasons as it
was designed for, and I appreciate the desire for some lossless codec
too. But I hadn't spent much time thinking about implementation
differences, mainly because at the time I did my research, there was
only one implementation!

Your ideas about using GStreamer are interesting, but I don't know much
about how a GStreamer pipeline would fit into PulseAudio's pipeline, and
how this would affect things like latency. Nonetheless, you're certainly
right about the maintenance burden, and I am a fan of GStreamer in any
case. To have its power in PulseAudio would be very interesting, and
certainly worth the research I would need to do to understand (say) the
effect on latency.

> 2. RTP: as our usage gets more complicated, we're going to end up
> implementing and maintaining a non-trivial RTP stack, which is actually
> quite hard.
>
> Deciding where to draw the line with regards to what does and does not
> belong in PulseAudio is a bit tricky, but in my mind, encoding/decoding
> should very much not be in PulseAudio because that beast inevitably gets
> more complicated as you try to do more, and there are others solving the
> problem system-wide.
>
> RTP, I can see a case for it being in PulseAudio, but it is also
> complicated, and as with codecs, there are other places in the system
> where it gets more attention and maintenance.

David asked in his other messages about whether my plans were for RTP or
for the native protocol. I'll describe my thinking here, too.

Originally, I had looked at writing code for implementing these codecs
in the native protocol, because I felt that I wanted to build the
support as closely to the core of PulseAudio as possible; doing so
seemed to be the easiest way to keep on top of the logic for adjusting
bitrate etc according to latency and frame drops, and I also felt that
doing so in the proprietary protocol would give us more coding liberty.

Now, however, I am largely in agreement that it is probably foolish to
maintain this whole stack in PulseAudio, and furthermore, it's probably
good to use the streaming protocols that are widely known and
understood, rather than duplicating the effort in a way that is likely
to be suboptimal.

Do you agree that using RTP rather than building the GStreamer pipeline
closer to the core of PulseAudio is probably the best plan?

> The simplest idea I can think of to deal with this meaningfully is to
> wrap a sink/source around a GStreamer pipeline to offload all that work
> that we don't want to duplicate in PulseAudio .

Quite.

> On the sink side, we'd hook up to an appsrc to feed PCM data to a
> pipeline. The pipeline would take care of encoding, RTP packetisation
> and possibly a corresponding RTCP stream. This would allow codec
> selection to be flexible, and in the distant future, could even support
> taking encoded data directly.
>
> On the source side, we'd hook up to an appsink, receiving PCM data from
> the pipeline. The pipeline would take care decoding whatever the format
> is, take care of RTCP and maybe more advanced features such as a jitter
> buffer and packet-loss concealment (all of this can be plugged in or
> not, depending on configuration).

This does sound like a good plan to solve the given problem, save for my
concerns about monitoring and integration above. My other thought, which
would not work in the context of using an appsrc/appsink, is that having
GStreamer built more closely into PulseAudio would also mean we could
offload things like resampling (which I noticed was discussed in another
GSoC-related thread) to the GStreamer pipeline, rather than have to
maintain those code-paths, too.

> Doing it this way means you're using a better RTP stack that gets
> attention from a number of other use cases (plus assorted related
> goodies) and support for multiple codecs.

Right, indeed. But I'm still not entirely sure about RTP vs native, and
where the GStreamer code should go. My temptation is to couple
PulseAudio as closely as possible to GStreamer. But I thought about this
a little when doing my initial research, and came to the conclusion that
if this was a good idea, it would have been done already. What am I
missing?

Indeed, if we build GStreamer into pulsecore,  then we could use RTP or
the native protocol, as we saw fit. If we went for the appsrc/sink
solution, then we'd be less flexible.

I haven't read the code in depth in a while, and I have exams soon. I
cannot start my research properly until they are out of the way in a
month or so, so I don't know how plausible this is. But it sounds quite
fun.

Cheers,

Toby