A common pattern to check N attributes for many paths is to (1) prepare an array A of N git_attr_check_elem items; (2) call git_attr() to intern the N attribute names and fill A; (3) repeatedly call git_check_attrs() for path with N and A; A look-up for these N attributes for a single path P scans the entire attr_stack, starting from the .git/info/attributes file and then .gitattributes file in the directory the path P is in, going upwards to find .gitattributes file found in parent directories. An earlier commit 06a604e6 (attr: avoid heavy work when we know the specified attr is not defined, 2014-12-28) tried to optimize out this scanning for one trivial special case: when the attribute being sought is known not to exist, we do not have to scan for it. While this may be a cheap and effective heuristic, it would not work well when N is (much) more than 1. What we would want is a more customized way to skip irrelevant entries in the attribute stack, and the definition of irrelevance is tied to the set of attributes passed to git_check_attrs() call, i.e. the set of attributes being sought. The data necessary for this optimization needs to live alongside the set of attributes, but a simple array of git_attr_check_elem simply does not have any place for that. Introduce "struct git_attr_check" that contains N, the number of attributes being sought, and A, the array that holds N git_attr_check_elem items, and a function git_check_attr() that takes a path P and this structure as its parameters. This structure can later be extended to hold extra data necessary for optimization. Side Note: While in this series, I am not interested in specifying the exact optimization, it would help readers to give what is envisioned. Even when the caller is interested in only a few attributes (say, "diff", "text" and "group-doc"), we'd walk the attribute entries in various .gitattributes files, perform the pattern match and collect attributes for the path. An earlier commit 06a604e6 (attr: avoid heavy work when we know the specified attr is not defined, 2014-12-28) tried to optimize out this scanning for one trivial special case: where an asked-for attribute does not appear anywhere in these files, we know we do not have to scan at all. We can do much better. We can scan these attribute entries that came from .gitattributes files once and mark the ones that affect one or more of the attributes we know the caller is interested in. Then we only go through the marked entries. E.g. if the caller is interested in "diff", "text" and "group-doc" attributes, we can mark entries that talks about any of "diff", "text", "group-doc" and "binary" attributes. The last one is because it is an attribute macro that expands to affect "diff" and "text" (these attribute are unset by having "binary"). The way git_attr_check() API is structured expects that the caller expresses the set of attributes it is interested in upfront, and reuses to ask the same question about many paths, and it is already optimized for the case where it does so in in-order recursive descent of a tree hierarchy by having an attr_stack that reads, pre-parses and caches contents of the .gitattributes files found in each directory hierarchy. The cached optimization data that sits in "struct git_attr_check" would be effective for this expected access pattern. The optimization data cannot be file-scope static to attr.c, even though that might be easier to implement, in anticipation of having more than one "struct git_attr_check" working alternatingly, e.g. having more than one pathspec with ":(attr=X)" pathspec magic, i.e. git cmd ":(attr=X)dir/" ":(attr=Y)dir/ectory/" Each pathspec element is expected to keep its own "struct git_attr_check" in the custom data to support pathspec magic to match with attributes, and the optimization data kept there would survive repeated calls to git_check_attr() for the same path with different set of attributes. The patches in the earliest part of the series have been sent to the list already; there is no substantial change (I think I made a typofix in the commit log message found by Eric). commit.c: use strchrnul() to scan for one line attr.c: use strchrnul() to scan for one line attr.c: update a stale comment on "struct match_attr" attr.c: explain the lack of attr-name syntax check in parse_attr() attr.c: complete a sentence in a comment attr.c: mark where #if DEBUG ends more clearly attr.c: simplify macroexpand_one() Step 8 is to make sure I can catch all the existing callers and update the API incrementally. attr: rename function and struct related to checking attributes Step 9 is the most important one. By updating the API to allow the callers hold piece of information more than just a simple array of <attr, value> pair, it paves the way to add data structures that will help optimizing the lookup. As a demonstration of the new API, it converts one caller. The immediate effect to the callers is that they can lose moderate amount of code, but that is not the point of this change. attr: (re)introduce git_check_attr() and struct git_attr_check The two patches that follow converts remaining callers to the enw API. attr: convert git_all_attrs() to use "struct git_attr_check" attr: convert git_check_attrs() callers to use the new API The last step retires the old one and updates the document. attr: retire git_check_attrs() API Documentation/technical/api-gitattributes.txt | 62 ++++++----- archive.c | 24 ++--- attr.c | 143 +++++++++++++++++++------- attr.h | 31 ++++-- builtin/check-attr.c | 55 +++++----- builtin/pack-objects.c | 19 +--- commit.c | 3 +- convert.c | 26 ++--- ll-merge.c | 33 +++--- userdiff.c | 19 ++-- ws.c | 19 ++-- 11 files changed, 236 insertions(+), 198 deletions(-) -- 2.8.2-748-gfb85f76 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html