Re: [PATCHv3] attr: convert to new threadsafe API

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 12 Oct 2016 16:33:31 -0700

Stefan Beller <sbeller@xxxxxxxxxx> writes:

> @@ -89,15 +114,20 @@ static void setup_check(void)
>  
>  ------------
>  	const char *path;
> +	struct git_attr_result *result;
>  
>  	setup_check();
> -	git_check_attr(path, check);
> +	result = git_check_attr(path, check);

This looks stale by a few revisions of the other parts of the patch?

> diff --git a/archive.c b/archive.c
> index 11e3951..15849a8 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -107,10 +107,12 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
>  		void *context)
>  {
>  	static struct strbuf path = STRBUF_INIT;
> +	static struct git_attr_check *check;
> +
>  	struct archiver_context *c = context;
>  	struct archiver_args *args = c->args;
>  	write_archive_entry_fn_t write_entry = c->write_entry;
> -	static struct git_attr_check *check;
> +	struct git_attr_result result = GIT_ATTR_RESULT_INIT;
>  	const char *path_without_prefix;
>  	int err;
>  
> @@ -124,12 +126,16 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
>  		strbuf_addch(&path, '/');
>  	path_without_prefix = path.buf + args->baselen;
>  
> -	if (!check)
> -		check = git_attr_check_initl("export-ignore", "export-subst", NULL);
> -	if (!git_check_attr(path_without_prefix, check)) {
> -		if (ATTR_TRUE(check->check[0].value))
> +	git_attr_check_initl(&check, "export-ignore", "export-subst", NULL);
> +	git_attr_result_init(&result, check);
> +
> +	if (!git_check_attr(path_without_prefix, check, &result)) {
> +		if (ATTR_TRUE(result.value[0])) {
> +			git_attr_result_clear(&result);
>  			return 0;
> -		args->convert = ATTR_TRUE(check->check[1].value);
> +		}
> +		args->convert = ATTR_TRUE(result.value[1]);
> +		git_attr_result_clear(&result);
>  	}

This is exactly what I meant by "can we avoid alloc/free of result
in leaf function when we _know_ how many attributes we are
interested in already, which is the majority of the case?".

Starting with a simple but unoptimized internal implementation of
the attr subsystem is one thing (which is good).  Exposing an API that
cannot be optimally implemented later without changing the callers
is another (which is bad).

By encapsulating each element into "struct git_attr_result", we can
extend the API without changing the API user [*1*].  

But I do not think of a way to allow an efficient implementation
later unless the source of the API user somehow says "this user is
only interested in this many attributes", like having this

	struct git_attr_result result[2];

(because this caller only wants "ignore" and "subst") on the API
user's side [*2*].  Without such a clue (like the patch above, that
only says "there is a structure called 'result'"), I do not think of
a way to avoid dynamic allocation on the API implementation side.

All the other callers in the patch (pack-objects, convert, ll-merge,
etc.) seem to share the exact same pattern.  Each of the leaf
functions knows a fixed set of attributes it is interested in, the
caller iterates over many paths and makes calls to these leaf
functions, and it is a waste to pay alloc/free overhead for each and
every iteration when we know how many elements result needs to
store.

[Footnote]

*1* Would we need a wrapping struct around the array of results?  If
    that is the case, we may need something ugly like this on the
    API user side:

	GIT_ATTR_RESULT_TYPE(2) result = {2,};

    with something like the following on the API implementation
    side:

        #define GIT_ATTR_RESULT_TYPE(n) \
            struct { \
                    int num_slots; \
                    const char *value[n]; \
            }

        struct git_attr_result {
                int num_slots;
                const char *value[FLEX_ARRAY];
        };
        git_attr_result_init(void *result_, struct git_attr_check *check)
        {
                struct git_attr_result *result = result_;

                assert(result->num_slots, check->num_attrs);
                ...
        }                
        git_check_attr(const char *path,
                       struct git_attr_check *check,
                       void *result_)
        {                       
                struct git_attr_result *result = result_;

                assert(result->num_slots, check->num_attrs);
                for (i = 0; i < check_num_attrs; i++)
                        result->value[i] = ... found value ...;
        }

*2* Or the uglier

	GIT_ATTR_RESULT_TYPE(2) result = {2,};

    I can see why the "check" side would benefit from a structure
    that contains an array, but I do not see why "result" side would
    want to, so I am hoping that we won't have to do this uglier
    variant and just go with the simple "array of resulting values".