On Mon, Jun 13, 2016 at 1:55 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > I hate to be doing this, but we need yet another revamp to the attr > API that affects all the callers. So you don't mean origin/jc/attr-more by this? (Given that we have jc/attr and jc/attr-more, the third thing could go with jc/even-more-attr. Though I do not propose that) > > In the original design, a codepath that wants to check attributes > repeatedly for many paths (e.g. "convert" that wants to see what > crlf, ident, filter, eol and text attributes are set to for paths > that it is asked to munge the blob contents for) used to allocate > > static struct git_attr_check { > struct git_attr *attr; > const char *value; > } ccheck[NUM_ATTRS]; > > and populated the array when the first time the codepath was > entered, e.g. > > if (!ccheck[0].attr) { > for (i = 0; i < NUM_ATTRS; i++) > ccheck[i].attr = git_attr(...); > } > > and then make a call to the API to ask for these attributes in > ccheck[] for a path, i.e. > > git_attr_check(path, NUM_ATTRS, ccheck); > > The API function will fill in the ccheck[].value fields and the > caller will read from there how each attribute is set for the path. > > This was updated with recent jc/attr topic so that attr_check > structure can be enriched to keep _more_ state, such as the > pre-parsed representation of all the contents of the relevant > .gitattributes files, which currently is held in a program-side > singleton instance called attr_stack, partly in preparation for > Stefan's sb/pathspec-label topic that is expected to place a lot > heavier load to the attribute subsystem. I do not quite expect that. Some numbers: (I am looking at https://android.googlesource.com/platform/manifest) * There are currently different 56 groups in that manifest * with an average of 1.5 groups for each entry that has at least one group, * or 0.88 groups per entry (which is one resulting file/submodule after the conversion) For reference: * git.git makes use of 1 attr (whitespace), in 2 value configurations * that is applied to each file. So comparing one entry to the 1.5 or 0.88 above doesn't seem to be that bad. That is what needs to be done on a per-file basis? What is different is the number of different attrs in use by about 2 orders of magnitude, which needs to be collected upfront and stored in a smart way? > > A caller of the API after that update would do more like > > static struct git_attr_check *ccheck; > > if (!ccheck) > ccheck = git_attr_check_initl(...); > git_attr_check(path, ccheck); > for (i = 0; i < NUM_ATTRS; i++) > if (ccheck->check[i].value == ATTR_UNSET) > ... do the Unset thing ... This is origin/jc/attr-more ? > > This however is not thread-safe for obvious two reasons. > > * Two threads can simultanously enter this section of the code and > end up initializing ccheck twice; > > * Even though ccheck[].attr are constant for the purpose of this > codepath (i.e. all threads passing through are interested in > checking the same set of attributes), ccheck[].value fields > cannot be shared across simultaneous threads like the above > snippet does. > > So it appears that the final API visible from the callers needs to > be updated further, perhaps something like: > > > static struct git_attr_check *ccheck; > const char *values[NUM_ATTRS]; > > git_attr_check_initl(&ccheck, "crlf", "ident", ..., NULL); > git_attr_check(path, ccheck, values); and later on each thread will do a git_attr_thread_byebye(&ccheck) ? to free its thread local stuff or rather is the calling site responsible for freeing up all of it after all threads are done? (This is probably too much of a detail question now?) > > where git_attr_check_initl() would do the "if ccheck is NULL then > initialize it to an instance" atomically in threading environment, > and git_attr_check() returns its finding to values[] array the > calling thread exclusively owns, while sharing the input ccheck > and the list of attributes the call inquires among threads. > > Also unlike the earlier plan, attr_stack aka "where in the directory > hierarchy are we asking attributes for?" will not be stored directly > in git_attr_check structure. It is likely that a thread-local > structure will be introduced hidden behind this API (i.e. the > callers do not have to be aware of it), and attr_stack will be > registered in this thread-local structure keyed by &ccheck, so that > <thread, callsite> pair can have attr_stack instance of its own. > > This is because a single attr_stack per ccheck will not work as an > optimization mechanism when multi-threaded. Two threads may be > running the same convert() function, and they may be interested in > the same set of attributes (e.g. "crlf", "ident", etc.) but they > will be working on different parts of the tree, so having attr_stack > per <thread, callsite> would make more sense. > > All of which will be quite a lot of work, though, so do not expect > that it will appear by the end of this week ;-) > > Thanks. > Thanks for the heads up, Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html