Hi, On Wed, Aug 03, 2016 at 12:11:28PM +0200, Christophe Fergeau wrote: > On Tue, Aug 02, 2016 at 11:48:49AM +0200, Victor Toso wrote: > > This patch avoids a race condition. The race happens when we mark the > > xfer-task as completed by receiving VD_AGENT_FILE_XFER_STATUS_SUCCESS > > message while the flush callback was not yet called. > > > > The flush callback is file_xfer_data_flushed_cb() and it might not be > > called immediately after data is flushed. > > > > The race can be verified while transferring several small files at > > once. I can see it often with more then 50 files in one transfer > > operation. > > I've also been able to reliably hit this running in valgrind. > > > > > This fix implies that SpiceMainChannel should check in its async > > callbacks if given SpiceFileTransferTask is completed. > > > > This patch introduces spice_file_transfer_task_is_completed (internal) > > to help check if spice_file_transfer_task_completed() was called or > > not. > > I wondered if we could cancel the pending async callbacks rather than > doing things this way, but it's probably more complicated to do that. Yes, It would be more complicated. I guess this 'completed' state could be converted to a property and we can use g_object_notify() to trigger a the cancellation ..? We also would need to change the GCancellable in this flush_async() function as we are using the SpiceFileTransferTask. Not sure if we would gain much from it but I could give it a try. > > > --- > > src/channel-main.c | 17 +++++++++++------ > > src/spice-file-transfer-task-priv.h | 1 + > > src/spice-file-transfer-task.c | 14 ++++++++++++++ > > 3 files changed, 26 insertions(+), 6 deletions(-) > > > > diff --git a/src/channel-main.c b/src/channel-main.c > > index bc7349f..14ad6cf 100644 > > --- a/src/channel-main.c > > +++ b/src/channel-main.c > > @@ -1766,10 +1766,12 @@ static void file_xfer_data_flushed_cb(GObject *source_object, > > return; > > } > > > > - file_transfer_operation_send_progress(xfer_task); > > - > > - /* Read more data */ > > - spice_file_transfer_task_read_async(xfer_task, file_xfer_read_async_cb, NULL); > > + /* task might be completed while on idle */ > > + if (!spice_file_transfer_task_is_completed(xfer_task)) { > > + file_transfer_operation_send_progress(xfer_task); > > + /* Read more data */ > > + spice_file_transfer_task_read_async(xfer_task, file_xfer_read_async_cb, NULL); > > + } > > } > > > > static void file_xfer_queue_msg_to_agent(SpiceMainChannel *channel, > > @@ -1814,13 +1816,15 @@ static void file_xfer_read_async_cb(GObject *source_object, > > } > > > > file_xfer_queue_msg_to_agent(channel, spice_file_transfer_task_get_id(xfer_task), buffer, count); > > - if (count == 0) { > > - /* on EOF just wait for VD_AGENT_FILE_XFER_STATUS from agent */ > > + if (count == 0 || spice_file_transfer_task_is_completed(xfer_task)) { > > + /* on EOF just wait for VD_AGENT_FILE_XFER_STATUS from agent > > + * in case the task was completed, nothing to do. */ > > return; > > } > > > > task_id = spice_file_transfer_task_get_id(xfer_task); > > xfer_op = g_hash_table_lookup(channel->priv->file_xfer_tasks, GUINT_TO_POINTER(task_id)); > > + g_return_if_fail(xfer_op != NULL); > > xfer_op->stats.total_sent += count; > > > > cancellable = spice_file_transfer_task_get_cancellable(xfer_task); > > @@ -1841,6 +1845,7 @@ static void main_agent_handle_xfer_status(SpiceMainChannel *channel, > > > > switch (msg->result) { > > case VD_AGENT_FILE_XFER_STATUS_CAN_SEND_DATA: > > + g_return_if_fail(spice_file_transfer_task_is_completed(xfer_task) == FALSE); > > spice_file_transfer_task_read_async(xfer_task, file_xfer_read_async_cb, NULL); > > return; > > case VD_AGENT_FILE_XFER_STATUS_CANCELLED: > > diff --git a/src/spice-file-transfer-task-priv.h b/src/spice-file-transfer-task-priv.h > > index 1ceb392..a7b58d8 100644 > > --- a/src/spice-file-transfer-task-priv.h > > +++ b/src/spice-file-transfer-task-priv.h > > @@ -52,6 +52,7 @@ gssize spice_file_transfer_task_read_finish(SpiceFileTransferTask *self, > > GError **error); > > guint64 spice_file_transfer_task_get_file_size(SpiceFileTransferTask *self); > > guint64 spice_file_transfer_task_get_bytes_read(SpiceFileTransferTask *self); > > +gboolean spice_file_transfer_task_is_completed(SpiceFileTransferTask *self); > > > > G_END_DECLS > > > > diff --git a/src/spice-file-transfer-task.c b/src/spice-file-transfer-task.c > > index eb943c4..4e51b7e 100644 > > --- a/src/spice-file-transfer-task.c > > +++ b/src/spice-file-transfer-task.c > > @@ -42,6 +42,7 @@ struct _SpiceFileTransferTask > > GObject parent; > > > > uint32_t id; > > + gboolean completed; > > gboolean pending; > > GFile *file; > > SpiceMainChannel *channel; > > @@ -270,6 +271,8 @@ G_GNUC_INTERNAL > > void spice_file_transfer_task_completed(SpiceFileTransferTask *self, > > GError *error) > > { > > + self->completed = TRUE; > > + > > Maybe add a g_warn_if_fail(self->completed == FALSE); or > g_return_if_fail() at the beginning of thee function? Or is it meant to > be called multiple times? It can be called multiple times, the code bellow is: if (self->pending) { /* Complete but pending is okay only if error is set */ if (self->error == NULL) { self->error = g_error_new(SPICE_CLIENT_ERROR, SPICE_CLIENT_ERROR_FAILED, "Cannot complete task in pending state"); } return; } In case we receive a message from agent that is going to cancel the operation but xfer-task is in the middle of an async operation. The self->error check in the async callback functions are related to this. > > > /* In case of multiple errors we only report the first error */ > > if (self->error) > > g_clear_error(&error); > > @@ -478,6 +481,16 @@ gssize spice_file_transfer_task_read_finish(SpiceFileTransferTask *self, > > return nbytes; > > } > > > > +G_GNUC_INTERNAL > > +gboolean spice_file_transfer_task_is_completed(SpiceFileTransferTask *self) > > +{ > > + g_return_val_if_fail(self != NULL, FALSE); > > + > > + /* File transfer is considered over at the first time it is marked as > > + * complete with spice_file_transfer_task_completed. */ > > + return self->completed; > > Just to be sure, the race we are talking about is not caused by > multithtreading, but only by the order in which things happen with > respect to network/main loop? (checking whether we need mutexes here or > not, I guess not). Yes, I don't think any of this is related to multithreading. I did not follow exactly how we receive the message faster then the idle callback, but I guess the coroutine can be faster then several callback functions schedule to run in idle. > > > +} > > + > > /******************************************************************************* > > * External API > > ******************************************************************************/ > > @@ -735,4 +748,5 @@ static void > > spice_file_transfer_task_init(SpiceFileTransferTask *self) > > { > > self->buffer = g_malloc0(FILE_XFER_CHUNK_SIZE); > > + self->completed = FALSE; > > } > > The instance will be cleared to 0 upon creation, I don't think you need > the explicit FALSE assignment there. Ah, true. I'll remove it. > > Acked-by: Christophe Fergeau <cfergeau@xxxxxxxxxx> Thanks! toso > > Christophe _______________________________________________ Spice-devel mailing list Spice-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/spice-devel