On 14.07.2016 00:02, Tanu Kaskinen wrote: > "Refactoring" usually means changing the code without changing the > behaviour. A better commit title for this patch would be "redo latency > initialization logic". > > On Sun, 2016-06-05 at 21:05 +0200, Georg Chini wrote: >> Initialize the latency to sane values at startup or after source or sink switch. >> This ensures that there are no underruns during the initial phase and also that >> the overall latency starts near to the configured latency. > It would be nice to explain in more detail what problem this patch > solves - what was wrong with the previous code? Apparently there were > underruns during the initial phase and the overall latency didn't start > near to the configured latency, but what caused those issues? > >> Additionally audio will be dropped until pop() is called for the first time to >> prevent excessive delays. >> Sink or source latencies smaller than 2.5 ms proved to produce a lot of errors, >> so the minimum source or sink latency is limited to 2.5 ms. >> Current boundary values are saved for the calculation of the minimum possible >> latency in one of the following patches. >> >> --- >> src/modules/module-loopback.c | 207 +++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 196 insertions(+), 11 deletions(-) >> >> diff --git a/src/modules/module-loopback.c b/src/modules/module-loopback.c >> index 246d622..aafe41b 100644 >> --- a/src/modules/module-loopback.c >> +++ b/src/modules/module-loopback.c >> @@ -61,6 +61,8 @@ PA_MODULE_USAGE( >> >> #define MEMBLOCKQ_MAXLENGTH (1024*1024*32) >> >> +#define MIN_DEVICE_LATENCY (2.5*PA_USEC_PER_MSEC) >> + >> #define DEFAULT_ADJUST_TIME_USEC (10*PA_USEC_PER_SEC) >> >> struct userdata { >> @@ -76,15 +78,30 @@ struct userdata { >> pa_rtpoll_item *rtpoll_item_read, *rtpoll_item_write; >> >> pa_time_event *time_event; >> - pa_usec_t adjust_time; >> >> int64_t recv_counter; >> int64_t send_counter; >> >> size_t skip; >> + >> + /* Values from command line configuration */ >> pa_usec_t latency; >> + pa_usec_t adjust_time; >> >> + /* Latency boundaries and current values */ >> + pa_usec_t min_source_latency; >> + pa_usec_t max_source_latency; >> + int64_t source_offset; >> + pa_usec_t min_sink_latency; >> + pa_usec_t max_sink_latency; >> + int64_t sink_offset; > This patch doesn't use source_offset or sink_offset for anything, so it > doesn't look like a good idea to add them in this patch. The commit message says that I add some variables for later use. I thought it was more convenient here because this way all the boundary variables are introduced in one patch which makes the other patches better readable. I can however move the variables if you prefer. > > "source_latency_offset" and "sink_latency_offset" would be clearer > names. > > Also, the user may reconfigure the latency offsets at any time, so if > you're going to use the offsets for something, you'll need to monitor > them for changes. How can I track those changes? I don't see any event I could use. The offsets are later used to determine the minimum possible target latency. > >> + pa_usec_t configured_sink_latency; >> + pa_usec_t configured_source_latency; >> + >> + /* Various booleans */ >> bool in_pop; >> + bool pop_called; >> + bool fixed_alsa_source; > fixed_alsa_source is not used anywhere, so it's better to not add it in > this patch. > >> >> struct { >> int64_t send_counter; >> @@ -279,6 +296,46 @@ static void update_adjust_timer(struct userdata *u) { >> enable_adjust_timer(u, true); >> } >> >> +/* Called from all contexts > Well, that's not a safe thing to do. I agree that it is potentially unsafe. Unfortunately the procedure must be called from main and IO context. I believe it is safe in this special case because I prevent adding silence to the queue when calling from IO context. Is this OK for you? > >> + * Sets the memblockq to the requested latency corrected by offset */ >> +static void memblockq_adjust(struct userdata *u, int32_t offset, bool allow_push) { >> + size_t memblock_bytes, requested_buffer_bytes; >> + pa_usec_t requested_buffer_latency; >> + size_t buffer_offset; > There's already a variable named "offset". I think naming this > "bytes_to_drop" would make things a bit more readable. > >> + pa_memchunk silence; >> + >> + requested_buffer_latency = u->latency; >> + if (offset > 0) >> + requested_buffer_latency = PA_CLIP_SUB(requested_buffer_latency, (pa_usec_t)offset); >> + else >> + requested_buffer_latency = requested_buffer_latency - offset; >> + >> + requested_buffer_bytes = pa_usec_to_bytes(requested_buffer_latency, &u->sink_input->sample_spec); >> + memblock_bytes = pa_memblockq_get_length(u->memblockq); > You probably meant "memblockq_bytes" rather than "memblock_bytes". > Anyway, I'd prefer "memblockq_length", since this variable seems to be > just a shorthand for pa_memblockq_get_length(). > >> + >> + if ((int32_t)(memblock_bytes - requested_buffer_bytes) > 0) { > This cast seems dubious. Both variables are unsigned, so the > substraction result will be unsigned too. Perhaps this works fine in > practice (at least if size_t is a 32-bit type), since the result will > be larger than INT32_MAX, and will be reinterpreted as the correct > negative value. > > Casting memblock_bytes to ssize_t before the substraction would look > less wrong, at least to me. > >> + /* Drop audio from queue */ >> + buffer_offset = memblock_bytes - requested_buffer_bytes; >> + pa_log_info("Dropping %zd bytes from queue", buffer_offset); >> + pa_memblockq_drop(u->memblockq, buffer_offset); >> + >> + } else if ((int32_t)(memblock_bytes - requested_buffer_bytes) < 0 && allow_push) { > Same note about the casting as above. > >> + /* Add silence to queue, will never happen from IO-thread */ >> + requested_buffer_bytes = requested_buffer_bytes - memblock_bytes; > Seems like a good place to use the "-=" operator. Hmm... even better > might be to use a new variable, because this seems to redefine the > meaning of requested_buffer_bytes (after this the number does not any > more reflect what was requested). > >> + pa_log_info("Adding %zd bytes of silence to queue", requested_buffer_bytes); >> + pa_sink_input_get_silence(u->sink_input, &silence); >> + while (requested_buffer_bytes >= silence.length) { >> + pa_memblockq_push_align(u->memblockq, &silence); >> + requested_buffer_bytes -= silence.length; >> + } >> + if (requested_buffer_bytes > 0) { >> + silence.length = requested_buffer_bytes; >> + pa_memblockq_push_align(u->memblockq, &silence); >> + } > No need to use pa_memblockq_push_align(). The memchunk length is > already aligned to the sink input sample spec. > >> + pa_memblock_unref(silence.memblock); >> + } >> +} >> + >> /* Called from input thread context */ >> static void source_output_push_cb(pa_source_output *o, const pa_memchunk *chunk) { >> struct userdata *u; >> @@ -342,6 +399,57 @@ static int source_output_process_msg_cb(pa_msgobject *obj, int code, void *data, >> return pa_source_output_process_msg(obj, code, data, offset, chunk); >> } >> >> +/* Calculates minimum and maximum possible latency for source and sink */ >> +static void update_latency_boundaries(struct userdata *u, pa_source *source, pa_sink *sink) { >> + >> + if (source) { >> + /* Source latencies */ >> + u->fixed_alsa_source = false; >> + if (source->flags & PA_SOURCE_DYNAMIC_LATENCY) >> + pa_source_get_latency_range(source, &u->min_source_latency, &u->max_source_latency); >> + else { >> + u->min_source_latency = pa_source_get_fixed_latency(source); >> + u->max_source_latency = u->min_source_latency; >> + if (source->driver) { >> + if (pa_streq(source->driver, "module-alsa-card.c")) > An alsa source may also be created by module-alsa-source. It would be > better to check if the "device.api" property of the source is set to > "alsa". > >> + u->fixed_alsa_source = true; >> + } >> + } >> + /* Source offset */ >> + u->source_offset = source->latency_offset; >> + /* Latencies below 2.5 ms cause problems, limit source latency if possible */ >> + if (u->max_source_latency >= MIN_DEVICE_LATENCY) >> + u->min_source_latency = PA_MAX(u->min_source_latency, MIN_DEVICE_LATENCY); > If min_source_latency is 0.5 ms and max_source_latency is 2.4 ms, you > probably still want to increase min_source_latency to 2.4 ms? This code > doesn't do that. > >> + } >> + >> + if (sink) { >> + /* Sink latencies */ >> + if (sink->flags & PA_SINK_DYNAMIC_LATENCY) >> + pa_sink_get_latency_range(sink, &u->min_sink_latency, &u->max_sink_latency); >> + else { >> + u->min_sink_latency = pa_sink_get_fixed_latency(sink); >> + u->max_sink_latency = u->min_sink_latency; >> + } >> + /* Sink offset */ >> + u->sink_offset = sink->latency_offset; >> + /* Latencies below 2.5 ms cause problems, limit sink latency if possible */ >> + if (u->max_sink_latency >= MIN_DEVICE_LATENCY) >> + u->min_sink_latency = PA_MAX(u->min_sink_latency, MIN_DEVICE_LATENCY); > Same comment as above. > >> + } >> +} >> + >> +/* Set source output latency to one third of the overall latency if possible */ > It would be good to add a note that one third is a pretty arbitrary > choice. The overall latency shouldn't be smaller than the source > latency plus the sink latency, but other than that, the buffer size > choices are rather arbitrary, as far as I can tell. > >> +static void set_source_output_latency(struct userdata *u, pa_source *source) { >> + pa_usec_t latency, requested_latency; >> + >> + requested_latency = u->latency / 3; >> + >> + latency = PA_CLAMP(requested_latency , u->min_source_latency, u->max_source_latency); > There's an extra space before ",". > >> + u->configured_source_latency = pa_source_output_set_requested_latency(u->source_output, latency); >> + if (u->configured_source_latency != requested_latency) >> + pa_log_warn("Cannot set requested source latency of %0.2f ms, adjusting to %0.2f ms", (double)requested_latency / PA_USEC_PER_MSEC, (double)u->configured_source_latency / PA_USEC_PER_MSEC); > If the actual configured latency of either sink or source is higher > than expected, then it may be that the configured source latency plus > the configured sink latency is higher than the configured overall > latency. In this case the target overall latency should be pushed up, > so that we don't try to target an impossible latency. Restricting the target latency to (hopefully) sane values is done in another patch. > >> +} >> + >> /* Called from output thread context */ >> static void source_output_attach_cb(pa_source_output *o) { >> struct userdata *u; >> @@ -419,6 +527,7 @@ static void source_output_moving_cb(pa_source_output *o, pa_source *dest) { >> struct userdata *u; >> char *input_description; >> const char *n; >> + pa_usec_t sink_latency; >> >> if (!dest) >> return; >> @@ -435,6 +544,26 @@ static void source_output_moving_cb(pa_source_output *o, pa_source *dest) { >> if ((n = pa_proplist_gets(dest->proplist, PA_PROP_DEVICE_ICON_NAME))) >> pa_sink_input_set_property(u->sink_input, PA_PROP_DEVICE_ICON_NAME, n); >> >> + /* Set latency and calculate necessary buffer length >> + * In some profile switching situations the sink will be invalid here. If so, >> + * skip the buffer adjustment, it will be done when the sink becomes valid */ >> + >> + update_latency_boundaries(u, dest, NULL); >> + set_source_output_latency(u, dest); >> + >> + if (!u->sink_input->sink) >> + memblockq_adjust(u, 0, true); > The indentation is a bit off here. > >> + else { >> + pa_sink_input_get_latency(u->sink_input, &sink_latency); > Why not pa_sink_get_latency()? > >> + if (u->send_counter > u->recv_counter) >> + sink_latency += pa_bytes_to_usec(u->send_counter - u->recv_counter, &u->sink_input->sample_spec); >> + if (dest->flags & PA_SOURCE_DYNAMIC_LATENCY) >> + sink_latency += pa_source_get_latency(dest); >> + else >> + sink_latency = PA_CLIP_SUB(sink_latency, pa_source_get_latency(dest)); > sink_latency is not the sink latency after all these additions and > substractions... I think the variable should either be renamed to > something more accurate, or use multiple variables. > > Why do you substract the source latency from sink_latency if the source > has fixed latency? I would expect you to add the source latency in > either case. I believe that is a mistake I copied from previous versions, but I have to perform a few tests to confirm. > >> + memblockq_adjust(u, sink_latency, true); >> + } >> + >> if (pa_source_get_state(dest) == PA_SOURCE_SUSPENDED) >> pa_sink_input_cork(u->sink_input, true); >> else >> @@ -465,6 +594,11 @@ static int sink_input_pop_cb(pa_sink_input *i, size_t nbytes, pa_memchunk *chunk >> pa_assert_se(u = i->userdata); >> pa_assert(chunk); >> >> + u->pop_called = true; >> + >> + /* It is not clear why the message queue needs to be drained in the pop callback, >> + * but not doing so causes underruns at low latencies. If the message processed >> + * here is the end of an underrun, the corresponding rewind will not be executed */ >> u->in_pop = true; >> while (pa_asyncmsgq_process_one(u->asyncmsgq) > 0) >> ; >> @@ -514,10 +648,16 @@ static int sink_input_process_msg_cb(pa_msgobject *obj, int code, void *data, in >> >> pa_sink_input_assert_io_context(u->sink_input); >> >> - if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state)) >> - pa_memblockq_push_align(u->memblockq, chunk); >> - else >> - pa_memblockq_flush_write(u->memblockq, true); >> + pa_memblockq_push_align(u->memblockq, chunk); >> + u->recv_counter += (int64_t) chunk->length; >> + >> + /* If the sink is not yet playing, discard audio and adjust the buffer to contain >> + * somewhat more than the configured latency. Use a quarter of the current sink >> + * latency as safety margin */ >> + if (!PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state) || !u->pop_called) { >> + memblockq_adjust(u, (int32_t)(-u->configured_sink_latency / 4), false); > -u->configured_sink_latency is a very large unsigned number, divide it > by four and it's still huge. It happens that dividing by four somehow > makes the result correct due to the 64-bit -> 32-bit conversion (I > don't exactly understand how that works, but I tried it with a simple > test program), but dividing by e.g. 5 would yield an unexpected result. > This code definitely needs fixing. Uh ... > >> + return 0; >> + } > The comment says "discard audio", but the posted chunk was pushed to > the memblockq. I guess you meant discarding old audio in case the > memblockq was already "full". In that case, try to make the comment > sound like you're not discarding the received chunk. > > What's the "safety margin" good for? Why would you ever want to buffer > more than what the target latency is? See below. > >> >> /* Is this the end of an underrun? Then let's start things >> * right-away */ >> @@ -531,18 +671,20 @@ static int sink_input_process_msg_cb(pa_msgobject *obj, int code, void *data, in >> false, true, false); >> } >> >> - u->recv_counter += (int64_t) chunk->length; >> - >> return 0; >> >> case SINK_INPUT_MESSAGE_REWIND: >> >> pa_sink_input_assert_io_context(u->sink_input); >> >> - if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state)) >> + if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state) && u->pop_called) >> pa_memblockq_seek(u->memblockq, -offset, PA_SEEK_RELATIVE, true); >> else >> - pa_memblockq_flush_write(u->memblockq, true); >> + /* If the sink is not yet playing, adjust the buffer to contain >> + * somewhat more than the configured latency, a rewind does not >> + * make sense in that case. Use a quarter of the current sink >> + * latency as safety margin */ >> + memblockq_adjust(u, (int32_t)(-u->configured_sink_latency / 4), false); > Why wouldn't the rewind make sense? (Well, source rewinding in general > is something I'm not sure makes a whole lot sense, but let's ignore > that.) I think the rewinding here means that the source told that the > last bit it sent out should be discarded. Since you're queuing all sent > data in the POST message handler even when the sink isn't running, I > think pa_memblockq_seek() would make sense here to get rid of the > recently received audio. > > Since pa_memblockq_seek() will make the memblockq shorter, I don't > think memblockq_adjust() makes sense. I don't know what your original > purpose of calling memblockq_adjust() here was. Do you think just using memblockq_seek() is sufficient? The current code flushes the memblockq when a rewind is received. The original intention of using memblockq_adjust() was to keep the queue the same length until the first pop() regardless of what happens on the source side, but you are right, a rewind here probably means that the last bit of audio should be discarded. > > As with the earlier code, negating an unsigned 64-bit integer, dividing > it by four and casting it to a signed 32-bit integer works as expected, > but just by chance. > >> >> u->recv_counter -= offset; >> >> @@ -567,6 +709,18 @@ static int sink_input_process_msg_cb(pa_msgobject *obj, int code, void *data, in >> return pa_sink_input_process_msg(obj, code, data, offset, chunk); >> } >> >> +/* Set sink input latency to one third of the overall latency if possible */ > Again, you could mention that the choice of one third is pretty > arbitrary. > >> +static void set_sink_input_latency(struct userdata *u, pa_sink *sink) { >> + pa_usec_t latency, requested_latency; > The indentation is a bit off here. > >> + >> + requested_latency = u->latency / 3; >> + >> + latency = PA_CLAMP(requested_latency , u->min_sink_latency, u->max_sink_latency); > There's an extra space before the ",". > >> + u->configured_sink_latency = pa_sink_input_set_requested_latency(u->sink_input, latency); >> + if (u->configured_sink_latency != requested_latency) >> + pa_log_warn("Cannot set requested sink latency of %0.2f ms, adjusting to %0.2f ms", (double)requested_latency / PA_USEC_PER_MSEC, (double)u->configured_sink_latency / PA_USEC_PER_MSEC); > I'll copy my earlier comment from set_source_output_latency(): > > If the actual configured latency of either sink or source is higher > than expected, then it may be that the configured source latency plus > the configured sink latency is higher than the configured overall > latency. In this case the target overall latency should be pushed up, > so that we don't try to target an impossible latency. > >> +} >> + >> /* Called from output thread context */ >> static void sink_input_attach_cb(pa_sink_input *i) { >> struct userdata *u; >> @@ -596,6 +750,8 @@ static void sink_input_detach_cb(pa_sink_input *i) { >> pa_rtpoll_item_free(u->rtpoll_item_read); >> u->rtpoll_item_read = NULL; >> } >> + >> + u->pop_called = false; >> } >> >> /* Called from output thread context */ >> @@ -649,6 +805,7 @@ static void sink_input_moving_cb(pa_sink_input *i, pa_sink *dest) { >> struct userdata *u; >> char *output_description; >> const char *n; >> + pa_usec_t source_latency; >> >> if (!dest) >> return; >> @@ -665,6 +822,22 @@ static void sink_input_moving_cb(pa_sink_input *i, pa_sink *dest) { >> if ((n = pa_proplist_gets(dest->proplist, PA_PROP_DEVICE_ICON_NAME))) >> pa_source_output_set_property(u->source_output, PA_PROP_MEDIA_ICON_NAME, n); >> >> + /* Set latency and calculate necessary buffer length >> + * In some profile switching situations the source will be invalid here. If so, >> + * skip the buffer adjustment, it will be done when the source becomes valid */ >> + >> + update_latency_boundaries(u, NULL, dest); >> + set_sink_input_latency(u, dest); >> + >> + if (!u->source_output->source) >> + memblockq_adjust(u, 0, true); >> + else { >> + pa_source_output_get_latency(u->source_output, &source_latency); > Why not pa_source_get_latency()? > >> + if (u->send_counter > u->recv_counter) >> + source_latency += pa_bytes_to_usec(u->send_counter - u->recv_counter, &u->sink_input->sample_spec); >> + memblockq_adjust(u, source_latency, true); > In source_output_moving_cb() you took both sink and source latencies > into account, shouldn't you do that here too? > >> + } >> + >> if (pa_sink_get_state(dest) == PA_SINK_SUSPENDED) >> pa_source_output_cork(u->source_output, true); >> else >> @@ -798,6 +971,7 @@ int pa__init(pa_module *m) { >> u->core = m->core; >> u->module = m; >> u->latency = (pa_usec_t) latency_msec * PA_USEC_PER_MSEC; >> + u->pop_called = false; >> >> adjust_time_sec = DEFAULT_ADJUST_TIME_USEC / PA_USEC_PER_SEC; >> if (pa_modargs_get_value_u32(ma, "adjust_time", &adjust_time_sec) < 0) { >> @@ -876,7 +1050,8 @@ int pa__init(pa_module *m) { >> u->sink_input->suspend = sink_input_suspend_cb; >> u->sink_input->userdata = u; >> >> - pa_sink_input_set_requested_latency(u->sink_input, u->latency/3); >> + update_latency_boundaries(u, NULL, u->sink_input->sink); >> + set_sink_input_latency(u, u->sink_input->sink); >> >> pa_source_output_new_data_init(&source_output_data); >> source_output_data.driver = __FILE__; >> @@ -927,7 +1102,8 @@ int pa__init(pa_module *m) { >> u->source_output->suspend = source_output_suspend_cb; >> u->source_output->userdata = u; >> >> - pa_source_output_set_requested_latency(u->source_output, u->latency/3); >> + update_latency_boundaries(u, u->source_output->source, u->sink_input->sink); >> + set_source_output_latency(u, u->source_output->source); >> >> pa_sink_input_get_silence(u->sink_input, &silence); >> u->memblockq = pa_memblockq_new( >> @@ -942,6 +1118,15 @@ int pa__init(pa_module *m) { >> &silence); /* silence frame */ >> pa_memblock_unref(silence.memblock); >> >> + /* Fill the memblockq with silence to avoid underruns at startup. >> + * For dynamic latency devices it is enough to use the configured >> + * loopback latency, else add some safety margin. A quarter of the >> + * configured sink latency should be sufficient */ >> + if(u->sink_input->sink->flags & PA_SINK_DYNAMIC_LATENCY) >> + memblockq_adjust(u, 0, true); >> + else >> + memblockq_adjust(u, (int32_t)(-u->configured_sink_latency / 4), true); > Here again I don't understand why any safety margin beyond the target > latency would be needed. Is it to handle the case where the source is > initially stopped and it takes a while before it pushes out the first > chunk of audio? I think it would be better to handle this similarly to > how you wait until pop() has been called for the first time. You don't > know how long startup delay the source is going to have. As you suspect, the safety margin is for the case that the source takes longer to start up. With the current code underruns cannot be avoided if the source takes too long, because source and sink are started at the same time. Once the sink is running, it will consume data. Do you think I should wait for the source to deliver some data before I connect the sink to the sink input? This would mean a somewhat longer delay before the loopback produces any output because I will have to wait for the sink again. > > Also, this is another instance of the accidentally-correct casting- > unsigned-to-signed mess that appeared a couple of times earlier. > > -- > Tanu > _______________________________________________ > pulseaudio-discuss mailing list > pulseaudio-discuss at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss