Re: [PATCHv3 6/6] rpc: Fix connection close callback race condition and memory corruption/crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/31/2013 10:20 AM, Peter Krempa wrote:
> The last Viktor's effort to fix the race and memory corruption unfortunately
> wasn't complete in the case the close callback was not registered in an
> connection. At that time, the trail of event's that I'll describe later could
> still happend and corrupt the memory or cause a crash of the client (including

s/happend/happen/

> the daemon in case of a p2p migration).
> 
> Consider the following prerequisities and trail of events:
> Let's have a remote connection to a hypervisor that doesn't have a close
> callback registered and the client is using the event loop. The crash happens in
> cooperation of 2 threads. Thread E is the event loop and thread W is the worker
> that does some stuff. R denotes the remote client.
> 
> 1.) W - The client finishes everything and sheds the last reference on the client
> 2.) W - The virObject stuff invokes virConnectDispose that invokes doRemoteClose
> 3.) W - the remote close method invokes the REMOTE_PROC_CLOSE RPC method.
> 4.) W - The thread is preempted at this point.
> 5.) R - The remote side recieves the close and closes the socket.

s/recieves/receives/

> 6.) E - poll() wakes up due to the closed socket and invokes the close callback
> 7.) E - The event loop is preempted right before remoteClientCloseFunc is called
> 8.) W - The worker now finishes, and frees the conn object.
> 9.) E - The remoteClientCloseFunc accesses the now-freed conn object in the
>         attempt to retrieve pointer for the real close callback.
> 10.) Kaboom, corrupted memory/segfault.
> 
> This patch tries to fix this by introducing a new object that survives the
> freeing of the connection object. We can't increase the reference count on the
> connection object itself as the connection would never be closed as the

s/itself as/itself or/; s/closed as/closed, as/

> connection is closed only when the reference count reaches zero.
> 
> The new object - virConnectCloseCallbackData - is a lockable object that keeps
> the pointers to the real user registered callback and ensures that the
> connection callback is either not called if the connection was already freed or
> that the connection isn't freed while this is being called.
> ---
>  src/datatypes.c            | 55 ++++++++++++++++++++++++++++++++++++--------
>  src/datatypes.h            | 22 ++++++++++++++----
>  src/libvirt.c              | 29 ++++++++++++-----------
>  src/remote/remote_driver.c | 57 +++++++++++++++++++++++++++-------------------
>  4 files changed, 112 insertions(+), 51 deletions(-)

So far, this is just a code review.  I'm also planning on testing the
patch, and will report back with results later in the day.

> @@ -63,14 +65,19 @@ static void virStoragePoolDispose(void *obj);
>  static int
>  virDataTypesOnceInit(void)
>  {
> -#define DECLARE_CLASS(basename)                                  \
> -    if (!(basename ## Class = virClassNew(virClassForObject(),   \
> +#define DECLARE_CLASS_COMMON(basename, parent)                   \
> +    if (!(basename ## Class = virClassNew(parent,                \
>                                            #basename,             \
>                                            sizeof(basename),      \
>                                            basename ## Dispose))) \
>          return -1;
> +#define DECLARE_CLASS(basename)                                  \
> +    DECLARE_CLASS_COMMON(basename, virClassForObject())
> +#define DECLARE_CLASS_LOCKABLE(basename)                         \
> +    DECLARE_CLASS_COMMON(basename, virClassForObjectLockable())

Wonder if these definitions are useful enough to rename to VIR_DECLARE_*
and move into virobject.h.  But that would be a separate patch, to
adjust all clients that would benefit from it.

> +++ b/src/datatypes.h
> @@ -93,6 +93,22 @@ extern virClassPtr virStoragePoolClass;
>  # define VIR_IS_DOMAIN_SNAPSHOT(obj) \
>      (VIR_IS_SNAPSHOT(obj) && VIR_IS_DOMAIN((obj)->domain))
> 
> +
> +typedef struct _virConnectCloseCallbackData virConnectCloseCallbackData;
> +typedef virConnectCloseCallbackData *virConnectCloseCallbackDataPtr;
> +
> +/**
> + * Internal structure holding data related to connection close callbacks.
> + */
> +struct _virConnectCloseCallbackData {
> +    virObjectLockable parent;
> +
> +    virConnectPtr conn;
> +    virConnectCloseFunc callback;
> +    void *opaque;
> +    virFreeCallback freeCallback;
> +};

Would it make sense to move this struct definition to be local to
datatypes.c, and have all other uses of it treat it as opaque?...

> +++ b/src/libvirt.c
> @@ -20189,24 +20189,27 @@ int virConnectRegisterCloseCallback(virConnectPtr conn,
>      virObjectRef(conn);
> 
>      virMutexLock(&conn->lock);
> +    virObjectLock(conn->closeCallback);
> 
>      virCheckNonNullArgGoto(cb, error);
> 
> -    if (conn->closeCallback) {
> +    if (conn->closeCallback->callback) {

...But then you would need to expose an accessor function instead of
directly accessing closeCallback->callback.  Okay, probably fine as is.

> +++ b/src/remote/remote_driver.c
> @@ -337,32 +337,38 @@ enum virDrvOpenRemoteFlags {
>      VIR_DRV_OPEN_REMOTE_AUTOSTART = (1 << 2), /* Autostart a per-user daemon */
>  };
> 
> +static void
> +remoteClientCloseFreeFunc(void *opaque)
> +{
> +    virConnectCloseCallbackDataPtr cbdata = opaque;
> +
> +    virObjectUnref(cbdata);
> +}

You shouldn't need this; just use virObjectFreeCallback instead.

> +remoteClientCloseFunc(virNetClientPtr client ATTRIBUTE_UNUSED,
> +                      int reason,
> +                      void *opaque)
>  {
> -    virConnectPtr conn = opaque;
> +    virConnectCloseCallbackDataPtr cbdata = opaque;
> 
> -    virMutexLock(&conn->lock);
> -    if (conn->closeCallback) {
> -        virConnectCloseFunc closeCallback = conn->closeCallback;
> -        void *closeOpaque = conn->closeOpaque;
> -        virFreeCallback closeFreeCallback = conn->closeFreeCallback;
> -        unsigned closeUnregisterCount = conn->closeUnregisterCount;
> +    virObjectLock(cbdata);
> 
> -        VIR_DEBUG("Triggering connection close callback %p reason=%d",
> -                  conn->closeCallback, reason);
> -        conn->closeDispatch = true;
> -        virMutexUnlock(&conn->lock);
> -        closeCallback(conn, reason, closeOpaque);
> -        virMutexLock(&conn->lock);
> -        conn->closeDispatch = false;
> -        if (conn->closeUnregisterCount != closeUnregisterCount &&
> -            closeFreeCallback)
> -            closeFreeCallback(closeOpaque);
> +    if (cbdata->callback) {
> +        VIR_DEBUG("Triggering connection close callback %p reason=%d, opaque=%p",
> +                  cbdata->callback, reason, cbdata->opaque);
> +        cbdata->callback(cbdata->conn, reason, cbdata->opaque);
> +
> +        if (cbdata->freeCallback)
> +            cbdata->freeCallback(cbdata->opaque);
> +        cbdata->callback = NULL;
> +        cbdata->freeCallback = NULL;
>      }
> -    virMutexUnlock(&conn->lock);
> +    virObjectUnlock(cbdata);
> +
> +    /* free the connection reference that comes along with the callback
> +     * registration */
> +    virObjectUnref(cbdata->conn);
>  }

Took me a couple reads, but it looks right.  The old code had to
temporarily drop connection lock while using the callback; the new code
never even grabs connection lock because all the callback data is
encapsulated in a separate object.

> 
>  /* helper macro to ease extraction of arguments from the URI */
> @@ -765,9 +771,11 @@ doRemoteOpen(virConnectPtr conn,
>              goto failed;
>      }
> 
> +    virObjectRef(conn->closeCallback);
> +
>      virNetClientSetCloseCallback(priv->client,
>                                   remoteClientCloseFunc,
> -                                 conn, NULL);
> +                                 conn->closeCallback, remoteClientCloseFreeFunc);

Here's where virObjectFreeCallback comes in handy.

On the surface, the code seems reasonable.  If my testing passes, and
you fix up the typos and the unneeded helper function, then I don't mind
giving:

ACK.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]