Hi,
On 30-06-16 09:08, Christophe Fergeau wrote:
On Wed, Jun 29, 2016 at 07:29:18PM +0200, Hans de Goede wrote:
Erm no, it is the caller's responsibility to make sure that all
spice_usbredir_channel's are properly torn down before the
usbdevicemanager gets torn down.
Otherwise you're not just having the problem of the thread continuing
to run, but also you still have open usbfs file-descriptors and any
redirected devices will not become available to the client-os again
(think e.g. virt-manager with multiple spice ontexts).
The proper fix is to fix the channel already being set to NULL,
without first calling spice_usbredir_channel_disconnect_device().
Yeah, I agree with most of this, however if the API is misused (and
actually it has been for years...), it still much better to cleanup the
thread rather than leaving it hanging around until it triggers a crash.
However, I should have made this commit more explicit that something bad
happened first, ie:
spice_usb_device_manager_stop_event_listening(self);
if (g_atomic_int_get(&priv->event_thread_run)) {
/* Force termination of the event thread even if there were some mismatched
* spice_usb_device_manager_{start,stop}_event_listening
* calls. This should not happen if the API is properly
* used, but if it's not, this will avoid a crash/thread
* leak */
g_warn_if_reached();
g_atomic_int_set(&priv->event_thread_run, FALSE);
}
(I'll send a v2 doing that, and add more precisions to the commit log
that it's not a real fix)
In this case spice_usbredir_channel_disconnect_device() is called with a
non-NULL channel:
#0 0x00007fc80b449dc4 in spice_usbredir_channel_disconnect_device
(channel=0x55aa2bc55060 [SpiceUsbredirChannel]) at
channel-usbredir.c:459
#1 0x00007fc80b449fb3 in _disconnect_device_thread (task=0x55aa2c7ad910
[GTask], object=0x55aa2bc55060, task_data=0x0, cancellable=0x0) at
channel-usbredir.c:508
#2 0x00007fc83f0fab10 in g_task_thread_pool_thread
(thread_data=0x55aa2c7ad910, pool_data=<optimized out>) at gtask.c:1288
#3 0x00007fc83f875735 in g_thread_pool_thread_proxy (data=<optimized
out>) at gthreadpool.c:307
#4 0x00007fc83f874d38 in g_thread_proxy (data=0x55aa2c93fed0) at
gthread.c:780
#5 0x00007fc84244f5ca in start_thread (arg=0x7fc8193b3700) at
pthread_create.c:333
#6 0x00007fc841a77ead in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
but SpiceChannel::session has already been set to NULL, so this specific
code block is not going to run:
SpiceSession *session = spice_channel_get_session(SPICE_CHANNEL(channel));
if (session != NULL)
spice_usb_device_manager_stop_event_listening(spice_usb_device_manager_get(session, NULL));
The session is set to NULL through:
#0 0x00007fc80b4221ed in spice_session_channel_destroy (session=0x55aa2ba69fa0 [SpiceSession], channel=0x55aa2bc55060 [SpiceUsbredirChannel]) at spice-session.c:2280
#1 0x00007fc80b41c70e in session_disconnect (self=0x55aa2ba69fa0 [SpiceSession], keep_main=0)
at spice-session.c:314
#2 0x00007fc80b420a3b in session_disconnect_idle (self=0x55aa2ba69fa0 [SpiceSession])
at spice-session.c:1908
#3 0x00007fc83f84e703 in g_main_context_dispatch (context=0x55aa2b09d320) at gmain.c:3154
I"m a bit wary of touching the device disconnection/channel teardown code paths
though as I'm not familiar with them :((
So it seems there are 2 problems:
1) The existence of the "if (session)" in:
if (session != NULL)
spice_usb_device_manager_stop_event_listening(spice_usb_device_manager_get(session, NULL));
git blame gives me:
https://cgit.freedesktop.org/spice/spice-gtk/commit/?id=e3932bfebbfec7637f3d03d90e8f9b75e3223236
Which says this was done to fix a segfault. Which clearly is a bad idea,
just adding if (pointer) checks to avoid segfaults is not the way to fix
bugs people!
So to fix this we need to root cause said segfault, fix it and then revert this
commit.
2) spice_usbredir_channel_disconnect_device now running in a thread, rather
then being a synchronous part of the tear-down sequence, this can either be
due to someone calling spice_usbredir_channel_disconnect_device_async() or
someone calling spice_usbredir_channel_reset(). Note that if either happens
just before session_disconnect() gets called we've a problem as then
the channel will be torn down while disconnect is running async, looking
at your backtraces that is exactly what is happening. This all seems to be
fallout from making usb device connect / disconnect async because windows
is oh so terribly slow with this. Since you cannot fix windows, you at
least not to make sure that this all gets to run its course properly
on shutdown / session close. Note that things being able to run their
course properly is also important to get the windows filter driver reconfigured
to give the redirected usb device back to windows proper / to rebind the
linux kernel driver to the usb device. Without this the client-os
will not be able to access the device even after closing virt-viewer /
whatever.
Regards,
Hans
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/spice-devel