On Tue, Jan 15, 2019 at 12:55 AM Matthieu Moy <git@xxxxxxxxxxxxxxx> wrote: > > Hi, > > ... > > You may suggest ideas by editting the wiki page, or just by replying to > this email (I'll point my students to the thread). Don't hesitate to > remove entries (or ask me to do so) on the wiki page if you think they > are not relevant anymore. I just mentioned this elsewhere [1] but let me summarize it here because I think this could be an interesting thing to do and once you get attr.c code it's not that hard to do. The student would need to understand about git attributes and how it's implemented in attr.c. But that's about it. More background below, but the summary line is "optimize attribute lookup to be proportional with the number of attributes queried, not the number of attributes present in .gitattributes files". So, we normally look up the same set of attributes over a long list of paths. We do this by building up an "attribute stack" containing all attribute info collected from all related .gitattributes files. Whenever we move from one path to the next, we update the stack slightly (e.g. if the previous path is a/b/c and the current one is a/d/e, we need to delete attributes from a/b/.gitattributes from the stack, then add ones from a/d/.gitattributes). Looking up is just a matter of going through this stack, find attribute lines that match the given path, then get the attribute value. This approach will not scale well. Assume that you have a giant .gitattrbutes file (or spreading over many files) with a zillion random attributes and two lines about "love" attribute. When you look up this "love" attribute you may end up going through all those attribute lines. [2] hints about a better approach in the comment near cannot_trust_maybe_real. If you know you are looking for "love", when you build up the attribute stack, just keep "love" and ignore everything else [3]. This way, the attribute stack that we need to lookup will have two lines about "love". Lookup time is of course now much faster. In the best possible case, when you look for an attribute that is not defined anywhere in .gitattributes files in your repo, you get an instant "not found" response because the attribute stack is empty. This edge case was implemented in [4]. [1] https://public-inbox.org/git/20190118165800.GA9956@xxxxxxxxxxxxxxxxxxxxx/T/#m32fef6a9e8f65dffae41e44a62dd76b4a84fa0fe [2] 7d42ec547c (attr.c: outline the future plans by heavily commenting - 2017-01-27) [3] well, macros make it a bit more complex, but I'll leave that as an exercise. [4] 06a604e670 (attr: avoid heavy work when we know the specified attr is not defined - 2014-12-28) -- Duy