Re: [PATCHv3] attr: convert to new threadsafe API

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 14 Oct 2016 08:37:05 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:

> *1* Would we need a wrapping struct around the array of results?

By the way, I do see a merit on the "check" side (tl;dr: but I do
not think "result" needs it, hence I do not see the need for the
"ugly" variants).

Take "archive" for example.  For each path, it wants to see the
attribute "export-ignore" to decide if it is to be omitted.  In
addition, the usual set of attributes used to smudge blobs into the
working tree representation are inspected by the convert.c API as
part of its implementation of convert_to_working_tree().  This
program has at least two sets of <"check", "result"> that are used
by two git_check_attr() callsites that are unaware of each other.

One of the optimizations we discussed is to trim down the attr-stack
(which caches the attributes read from .gitattributes files that are
in effect for the "last" directory that has the path for which
attrbiutes are queried for) by reading/keeping only the entries that
affect the attributes the caller is interested in.  But when there
are multiple callsites that are interested in different sets of
attributes, we obviously cannot do such an optimization without
taking too much cache-invalidation hit.  Because these callsites are
not unaware of each other, I do not think we can say "keep the
entries that affects the union of all active callsites" very easily,
even if it were possible.

But we could tie this cache to "check", which keeps a constant
subset of attributes that the caller is interested in (i.e. each
callsite would keep its own cache that is useful for its query).
While we are single-threaded, "struct git_attr_check" being a
wrapping struct around the array of "what attributes are of
interest?" is a good place to add that per-check attr-stack cache.
When we go multi-threaded, the attr-stack cache must become
per-thread, and needs to be moved to per-thread storage, and such a
per-thread storage would have multiple attr-stack, one per "check"
instance (i.e. looking up the attr-stack may have to say "who/what
thread am I?" to first go to the thread-local storage for the
current thread, where a table of pointers to attr-stacks is kept and
from there, index into that table to find the attr-stack that
corresponds to the particular "check").  We could use the address of
"check" as the key into this table, but "struct git_attr_check" that
wraps the array gives us another option to allocate a small
consecutive integer every time initl() creates a new "check" and use
it as the index into that attr-stack table, as that integer index
can be in the struct that wraps the array of wanted attributes.

	Note. none of the above is a suggestion to do the attr
	caching the way exactly described.  The above is primarily
	to illustrate how a wrapping struct may give us future
	flexibility without affecting a single line of code in the
	user of API.

It may turn out that we do not need to have anything other than the
array of wanted attributes in the "check" struct, but unlike
"result", "check" is shared across threads, and do not have to live
directly on the stack, so we can prepare for flexibility.

I do not foresee a similar need for wrapping struct for "result",
and given that we do want to keep the option of having them directly
on the stack, I am inclined to say we shouldn't introduce one.

If we were still to do the wrapping for result, I would say that
basing it around the FLEX_ARRAY idiom, i.e.

>         struct git_attr_result {
>                 int num_slots;
>                 const char *value[FLEX_ARRAY];
>         };

is a horrible idea.  It would be less horrible if it were

	struct git_attr_result {
		int num_slots;
		const char **value;
	};

then make the API user write via a convenience macro something like
this

	const char *result_values[NUM_ATTRS_OF_INTEREST];
	struct git_attr_result result = {
		ARRAY_SIZE(result_values), &result_values
	};                

instead.  That way, at least the side that implements git_check_attr()
would not have to be type-unsafe like the example of ugliness in the
message I am following-up on.