Re: Re attr API further revamp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 13, 2016 at 1:55 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> I hate to be doing this, but we need yet another revamp to the attr
> API that affects all the callers.

So you don't mean origin/jc/attr-more by this?
(Given that we have jc/attr and jc/attr-more, the third thing could go
with jc/even-more-attr. Though I do not propose that)

>
> In the original design, a codepath that wants to check attributes
> repeatedly for many paths (e.g. "convert" that wants to see what
> crlf, ident, filter, eol and text attributes are set to for paths
> that it is asked to munge the blob contents for) used to allocate
>
>         static struct git_attr_check {
>                 struct git_attr *attr;
>                 const char *value;
>         } ccheck[NUM_ATTRS];
>
> and populated the array when the first time the codepath was
> entered, e.g.
>
>         if (!ccheck[0].attr) {
>                 for (i = 0; i < NUM_ATTRS; i++)
>                         ccheck[i].attr = git_attr(...);
>         }
>
> and then make a call to the API to ask for these attributes in
> ccheck[] for a path, i.e.
>
>         git_attr_check(path, NUM_ATTRS, ccheck);
>
> The API function will fill in the ccheck[].value fields and the
> caller will read from there how each attribute is set for the path.
>
> This was updated with recent jc/attr topic so that attr_check
> structure can be enriched to keep _more_ state, such as the
> pre-parsed representation of all the contents of the relevant
> .gitattributes files, which currently is held in a program-side
> singleton instance called attr_stack, partly in preparation for
> Stefan's sb/pathspec-label topic that is expected to place a lot
> heavier load to the attribute subsystem.

I do not quite expect that. Some numbers:
(I am looking at https://android.googlesource.com/platform/manifest)
* There are currently different 56 groups in that manifest
* with an average of 1.5 groups for each entry that has at least one group,
* or 0.88 groups per entry (which is one resulting file/submodule after the
  conversion)

For reference:

* git.git makes use of 1 attr (whitespace), in 2 value configurations
* that is applied to each file.

So comparing one entry to the 1.5 or 0.88 above doesn't seem to
be that bad. That is what needs to be done on a per-file basis?

What is different is the number of different attrs in use by about
2 orders of magnitude, which needs to be collected upfront and stored
in a smart way?

>
> A caller of the API after that update would do more like
>
>         static struct git_attr_check *ccheck;
>
>         if (!ccheck)
>                 ccheck = git_attr_check_initl(...);
>         git_attr_check(path, ccheck);
>         for (i = 0; i < NUM_ATTRS; i++)
>                 if (ccheck->check[i].value == ATTR_UNSET)
>                         ... do the Unset thing ...

This is origin/jc/attr-more ?

>
> This however is not thread-safe for obvious two reasons.
>
>  * Two threads can simultanously enter this section of the code and
>    end up initializing ccheck twice;
>
>  * Even though ccheck[].attr are constant for the purpose of this
>    codepath (i.e. all threads passing through are interested in
>    checking the same set of attributes), ccheck[].value fields
>    cannot be shared across simultaneous threads like the above
>    snippet does.
>
> So it appears that the final API visible from the callers needs to
> be updated further, perhaps something like:
>
>
>         static struct git_attr_check *ccheck;
>         const char *values[NUM_ATTRS];
>
>         git_attr_check_initl(&ccheck, "crlf", "ident", ..., NULL);
>         git_attr_check(path, ccheck, values);

and later on each thread will do a

        git_attr_thread_byebye(&ccheck) ?

to free its thread local stuff or rather is the calling site responsible for
freeing up all of it after all threads are done?
(This is probably too much of a detail question now?)

>
> where git_attr_check_initl() would do the "if ccheck is NULL then
> initialize it to an instance" atomically in threading environment,
> and git_attr_check() returns its finding to values[] array the
> calling thread exclusively owns, while sharing the input ccheck
> and the list of attributes the call inquires among threads.
>
> Also unlike the earlier plan, attr_stack aka "where in the directory
> hierarchy are we asking attributes for?" will not be stored directly
> in git_attr_check structure.  It is likely that a thread-local
> structure will be introduced hidden behind this API (i.e. the
> callers do not have to be aware of it), and attr_stack will be
> registered in this thread-local structure keyed by &ccheck, so that
> <thread, callsite> pair can have attr_stack instance of its own.
>
> This is because a single attr_stack per ccheck will not work as an
> optimization mechanism when multi-threaded.  Two threads may be
> running the same convert() function, and they may be interested in
> the same set of attributes (e.g. "crlf", "ident", etc.) but they
> will be working on different parts of the tree, so having attr_stack
> per <thread, callsite> would make more sense.
>
> All of which will be quite a lot of work, though, so do not expect
> that it will appear by the end of this week ;-)
>
> Thanks.
>

Thanks for the heads up,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]