Re: [PATCH 28/28] attr: convert to new threadsafe API

Stefan Beller <sbeller@xxxxxxxxxx> · Tue, 11 Oct 2016 11:56:04 -0700

On Tue, Oct 11, 2016 at 11:23 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>>> I find this description a bit confusing.  At least the way I
>>> envisioned was that when this piece of code is run by multiple
>>> people at the same time,
>>>
>>>         static struct git_attr_check *check = NULL;
>>>         git_attr_check_initl(&check, ...);
>>>
>>> we won't waste the "check" by allocated by the first one by
>>> overwriting it with another one allocated by the second one.  So
>>> "the same arguments" does not come into the picture.  A single
>>> variable is either
>>>
>>>  * already allocated and initialised by the an earlier call to
>>>    initl() by somebody else, or
>>>
>>>  * originally NULL when you call initl(), and the implementation
>>>    makes sure that other people wait while you allocate, initialise
>>>    and assign it to the variable, or
>>>
>>>  * originally NULL when you call initl(), but the implementation
>>>    notices that somebody else is doing the second bullet point
>>>    above, and you wait until that somebody else finishes and then
>>>    you return without doing anything (because by that time, "check"
>>>    is filled by that other party doing the second bullet point
>>>    above).
>>>
>>> There is no need to check for "the same arguments".
>>>
>>
>> I see. So we assume that there are no different arguments at the same time,
>> i.e. all threads run the same code when it comes to attrs.
>
> Sorry, but I fail to see how you can jump to that conclusion.
> Puzzled.
>
> You can have many different callsites (think: hits "git grep" finds)
> that call git_attr_check_initl() and they all may be asking for
> different set of attributes.  As long as they are using different
> "check" instance to receive these sets of attributes, they are OK.

Right, but that requires a mutex per callsite; up to now I imagined
a global mutex only, which is how I came to the conclusion.

>
> It is insane to use the same "check" variable to receive sets of
> attributes for different attributes,

I agree.

> be it from the same call or
> different one, it is insane to do this:
>
>         func(char *anotherattr)
>         {
>                 static struct git_attr_check *check = NULL;
>                 git_attr_check_initl(&check, "one", anotherattr, ...);
>
>                 ... use "check" to ask question ...
>         }
>
> The whole point of having a static, initialize-once instance of
> "check" is so that initl() can do potentially heavy preparation just
> once and keep reusing it.  Allowing a later caller of func() to pass
> a value of anotherattr that is different from the one used in the
> first call that did cause initl() to allocate-initialise-assign to
> "check" is simply insane, even there is no threading issue.

I was imagining a file.c like that:

static struct git_attr_check *check = NULL;

void one_func()
{
    git_attr_check_initl(&check, "one", ...);
    ...
}

void two_func()
{
    git_attr_check_initl(&check, "two", ...);
    ...
}

int foo_cmd(const char *prefix int argc, char **argv)
{
    foreach_path(get_paths(...))
        one_func();
    check = NULL;
    foreach_path(get_paths(...))
        two_func();
}

This is correct single threaded code, but as soon as you want to
put phase one,two into threads, as they can be parallelized, this
goes horribly wrong.

>
> And in a threaded environment it is even worse; the first thread may
> call initl() to get one set of attributes in "check" and it may be
> about to ask the question, while the second call may call initl()
> and by your definition it will notice they have different sets of
> attributes and returns different "check"?  Either the earlier one is
> leaked, or it gets destroyed even though the first thread hasn't
> finished with "check" it got.
>
> It is perfectly OK to drop "static" from the above example code.
> Then it no longer is insane--it is perfectly normal code whose
> inefficiency cannot be avoided because it wants to do dynamic
> queries.

I think we had a misunderstanding here, as I was just assuming a
single mutex later on.