Re: races in dict_foreach() causing crashes in tier-file-creat.t

Xavier Hernandez <xhernandez@xxxxxxxxxx> · Mon, 14 Mar 2016 09:31:24 +0100

Hi,

On 11/03/16 13:55, Jeff Darcy wrote:
Raghavendra G and I discussed about this problem and the right way to
fix it is to take a copy(without dict_foreach) of the dictionary in
dict_foreach inside a lock and then loop over the local dictionary. I am
worried about the performance implication of this

I'm worried about the correctness implications.  Any such copy will have
to do the equivalent of dict_foreach even if it doesn't call the function
of that name, and will be subject to the same races.

Also included Xavi, who earlier said we need to change dict.c but it is
a bigger change. May be the time has come? I would love to gather all
your inputs and implement a better version of dict if we need one.

There are three solutions I can think of.

(1) Have tier use STACK_WIND_COOKIE to pass the address of a lock down
both paths, so the two can synchronize between themselves.

(2) Have tier issue the lookup down the two paths *serially* instead
of in parallel, so there's no contention on the dictionary.  This is
probably most complicated *and* worst for performance, but I include
it for the sake of completeness.

(3) Enhance dict_t with a gf_lock_t that can be used to serialize
access.  We don't have to use the lock in every invocation of
dict_foreach (though we should probably investigate that).  For
now, we can just use it in the code paths we know are contending.

I suspect Xavi was suggesting (3) and I concur that it's the best
long-term solution.

My initial idea was quite different and probably unpractical to 
implement due to its implications in existing code. However here it is:

The idea was to have a dict based on refcounting controlled by caller 
(instead of callee like it's done now) and without locks (only atomic 
operations to increase/decrease refcounting). When some dict needs to be 
modified, it's only done if refcount is 1, otherwise a copy is made 
before changing it. This means that any dict with more that one 
reference becomes read-only avoiding any interference between xlators.

The most critical change is the caller controlled refcounting. 
Currently, when some xlator needs to use a dict, it takes a ref. This 
makes impossible to know if the already existing ref (owned by the 
caller) will be used or not, so we should always do copies of dicts, 
which is inefficient.

When it's the caller who controls the refcounting, if it still needs the 
reference to do some other work after calling a function that needs the 
dict, it will increase the refcounting. Otherwise it will simply "give" 
its reference to the called function. With this approach we'll need to 
distinguish between functions that really need their own reference form 
those that can share the reference of the caller.

Example:

  Current code:

     dict = ...;

     /* We won't use dict anymore in this function, but we still hold
        a reference on the dict */
     do_something(..., dict, ...);
     /* We release our reference */
     dict_unref(dict);

     ---------

     /* Function that needs its own reference to dict */
     do_something(..., dict_t *dict, ...) {
         dict_ref(dict);

         /* We do something in background that uses the dict. It will
            take care of releasing the ref when not needed anymore. */
         do_something_in_background(..., dict, ...);
     }

     /* Function that doesn't need its own reference to dict */
     do_something(..., dict_t *dict, ...) {
         /* We do something with the dict here, but we don't return
            until all work has been done. In this case we can share the
            ref of the caller. */
         <do something with dict>
     }

  Proposed change:

    dict = ...;

    /* We won't use dict anymore, so we pass our reference to
       do_something() */
    do_something(..., dict, ...);

    /* If do_something() shares our reference, we'll need to release it
       here.

       dict_unref(dict);

    */

    /* Here we cannot use dict because we don't have any reference to
       it */

    ---------

    /* Function that needs its own reference to dict */
    do_something(..., dict_t *dict, ...) {
        /* We already have a reference that we can use */
        ...
        /* We do something in background that uses the dict. It will
           take care of releasing the ref when not needed anymore. */
        do_something_in_background(..., dict, ...);
        /* dict cannot be used here unless a ref has been taken before
           calling do_something_in_background(). */
    }

    /* Function that doesn't need its own reference to dict */
    do_something(..., dict_t *dict, ...) {
        /* We do something with the dict here, but we don't return
           until all work has been done. In this case we can share the
           ref of the caller. */
        <do something with dict>
    }

if we need to keep a reference on the caller:

    dict = ...;

    /* Acquire a new reference for do_something() if it won't share our
       reference */
    dict_ref(dict);

    do_something(..., dict, ...);

    /* Do whatever we need with our reference on dict. Note that our
       dict will be clean, exactly as we had it before calling
       do_something(), whatever this function has done. */
    <do anything on dict>

    /* Release our ref */
    dict_unref(dict);

Xavi

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel