Re: [RFC/PATCH] attr: Document a new possible thread safe API

Stefan Beller <sbeller@xxxxxxxxxx> · Tue, 4 Oct 2016 16:25:27 -0700

On Tue, Oct 4, 2016 at 4:13 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>> diff --git a/Documentation/technical/api-gitattributes.txt b/Documentation/technical/api-gitattributes.txt
>> index 92fc32a..940617e 100644
>> --- a/Documentation/technical/api-gitattributes.txt
>> +++ b/Documentation/technical/api-gitattributes.txt
>> @@ -59,7 +59,10 @@ Querying Specific Attributes
>>    empty `struct git_attr_check` can be prepared by calling
>>    `git_attr_check_alloc()` function and then attributes you want to
>>    ask about can be added to it with `git_attr_check_append()`
>> -  function.
>> +  function. git_attr_check_initl is thread safe, i.e. you can call it
>> +  from different threads at the same time; internally however only one
>> +  call at a time is processed. If the calls from different threads have
>> +  the same arguments, the returned `git_attr_check` may be the same.
>
> I do not think this is enough.  Look at the example for _initl() and
> notice that it keeps the "singleton static check that is initialized
> by the very first caller if the caller notices it is NULL" pattern.
>
> One way to hide that may be to pass the address of that singleton
> pointer to _initl(), so that it can do the "has it been initialized?
> If not, let's prepare the thing" under lock.

Oh, I see. Yeah that makes sense.

>
>> @@ -89,15 +92,21 @@ static void setup_check(void)
>>
>>  ------------
>>       const char *path;
>> +     struct git_attr_check *result;
>>
>>       setup_check();
>> -     git_check_attr(path, check);
>> +     result = git_check_attr(path, check);
>
> I haven't formed a firm opinion, but I suspect your thinking might
> be clouded by too much staring at the current implementation that
> has <attr>,<value> pairs inside git_attr_check.  Traditionally, the
> attr subsystem used the same type for the query question and the
> query answer the same type, but it does not have to stay to be the
> case at all.  Have you considered that we are allowed to make these
> two types distinct?

I thought about that, but as I concluded that the get_all_attrs doesn't need
conversion to a threading environment, we can keep it as is.

When keeping the get_all_attrs as is, we need to keep the data structures
as is, (i.e. key,value pair inside git_check_attr), so introducing a new
data type seemed not useful for the threaded part.

>  A caller can share the same question instance
> (i.e. the set of interned <attr>, in git_attr_check) with other
> threads as that is a read-only thing, but each of the callers would
> want to have the result array on its own stack if possible
> (e.g. when asking for a known set of attributes, which is the
> majority of the case) to avoid allocation cost.  I'd expect the most
> typical caller to be
>
>         static struct git_attr_check *check;
>         struct git_attr_result result[2]; /* we want two */
>
>         git_attr_check_initl(&check, "crlf", "ident", NULL);
>         git_check_attr(path, check, result);
>         /* result[0] has "crlf", result[1] has "ident" */
>
> or something like that.

I see, that seems to be a clean API. So git_attr_check_initl
will lock the mutex and once it got the mutex it can either
* return early as someone else did the work
* needs to do the actual work
and then unlock. In any case the work was done.

git_check_attr, which runs in all threads points to the same check,
but gets the different results.

Ok, I'll go in that direction then.

Thanks,
Stefan