On Fri, 2017-05-05 at 16:21 +0200, Georg Chini wrote: > On 04.05.2017 20:11, Tanu Kaskinen wrote: > > On Wed, 2017-05-03 at 22:19 +0200, Georg Chini wrote: > > > On 03.05.2017 21:58, Tanu Kaskinen wrote: > > > > On Tue, 2017-05-02 at 07:12 +0200, Georg Chini wrote: > > > > > On 01.05.2017 22:10, Tanu Kaskinen wrote: > > > > > > On Mon, 2017-04-24 at 19:33 +0200, Georg Chini wrote: > > > > > > > There are several places in module-echo-cancel where a segfault is > > > > > > > possible when the master sink or source is invalid. > > > > > > > > > > > > I don't think the rewind, volume and mute callbacks are ever called > > > > > > during stream moves, at least with the current code base. However, > > > > > > adding the extra checks does no harm either, so I don't mind, but I > > > > > > think the commit message should be clarified on this point. > > > > > > > > > > I do not mention a move at all in the commit message, I just > > > > > say there are several places where a segfault is possible. > > > > > > > > The patch adds sink_input->sink and source_output->source checks, and > > > > checking those pointers is only relevant during stream moves (or during > > > > stream initialization before the routing has been decided, but the > > > > callbacks aren't used that early). At all other times the sink and > > > > source pointers are non-NULL. > > > > > > > > > I think the rewind callback is called during a move, the > > > > > PA_SINK_MESSAGE_START_MOVE calls request_rewind(). > > > > > > > > That happens before detaching the sink input, so the sink pointer is > > > > not NULL. > > > > > > This seems not correct. At least after fixing the first bug, the next > > > crash occurred in pa_sink_invalidate_requested_latency() during > > > PA_SINK_MESSAGE_START_MOVE. The request_rewind is just a few > > > lines below. > > > > > > There must be (at least for echo-cancel) a possible situation > > > where the sink-input is moved from (null) to a real sink, otherwise > > > the crash I fixed could not happen at all. > > > > Sorry, I mixed the request rewind and process rewind callbacks. When > > pa_sink_request_rewind() is called on the virtual sink, the request > > gets propagated to the master sink, but in this case the filter sink is > > not connected to the master sink. > > > > I remember fighting with this case before... (looking up old stuff...) > > Last time it was module-device-manager that was moving a stream from an > > echo-cancel sink to the auto null sink during an alsa card profile > > change. So the case was quite similar, just a different policy module > > (device-manager instead of switch-on-connect). > > > > The fix that was used last time is not really applicable here. Part of > > the fix was to set the DONT_MOVE flags to module-echo-cancel's streams, > > but those flags are only set when module-echo-cancel is autoloaded by > > module-filter-apply, and apparently in this case module-echo-cancel is > > manually loaded (otherwise this situation shouldn't happen). > > > > We haven't so far supported moving streams that are connected to an > > already-moving filter sink, and I don't think this patch is a good way > > to add that support. The crash happens in some IO thread, but since the > > filter sink is not attached to any master sink, it's not clear which IO > > thread that is. Probably it's the old IO thread (the sink asyncmsgq > > pointer gets updated in the moving callback), but my point is that if > > it seems to work, it works by accident. After the filter sink has been > > detached from its master, the old IO thread could be already destroyed, > > so relying on it still running seems like a bad idea. If we want to > > support moving streams while not connected to any IO thread, I think > > the system design should be thought through properly. > > After going through the code for I while, I believe the only possible > situation where such a "move within a move" can happen is when > the alsa card is switching the profile. In that situation we know > that the old thread still exists, because the sink is destroyed after > calling start_move() for all connected inputs. In all other cases the > move is either completed or fails before another move can be started. > > > > > I propose applying this patch (the commit message needs to be > > rewritten): https://patchwork.freedesktop.org/patch/68741/ > > > > That patch was not applied, because it was not needed after all to > > solve the bug last time, and because Arun didn't like it, but I think > > it makes sense and it should fix this crash. It will prevent module- > > switch-on-connect from moving the application stream to the auto null > > sink, which is fine. > > I agree with you that my patch is wrong, but I think there is a simpler > solution than your patch. The reason why the move is triggered at all > is because the virtual sink becomes the default sink when no other > sink is available. Then a null-sink turns up and the streams are moved > away from the default sink. This does not make sense anyway for a > virtual sink, we don't want the streams to move away from there. > So the simple fix is to prevent the virtual sink from being the default > sink when it is not connected to a master, then the move will not occur. > If you are OK with that approach, I would send a patch. By "the virtual sink", do you mean the null sink? Would you want it to be impossible to make any null sink the default sink? That doesn't seem like a good idea, so do you propose some more nuanced policy? I believe preventing the move to the null sink isn't enough anyway, because module-switch-on-connect will again try to move all streams when the sink of the new alsa profile is created. That happens before the echo-cancel sink has been attached to the new alsa sink. In defence of my patch, I believe it makes sense regardless of whether we make also another patch that avoids this specific situation. Moving a stream from a moving filter sink is unsupported, and will stay unsupported for the foreseeable future. Preventing all such move attempts seems like a good idea to me. > BTW, we can then also revert my first commits that are already in the > tree. True. -- Tanu https://www.patreon.com/tanuk