Re: Students projects: looking for small and medium project ideas

Duy Nguyen <pclouds@xxxxxxxxx> · Tue, 22 Jan 2019 17:09:34 +0700

On Tue, Jan 15, 2019 at 12:55 AM Matthieu Moy <git@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> ...
>
> You may suggest ideas by editting the wiki page, or just by replying to
> this email (I'll point my students to the thread). Don't hesitate to
> remove entries (or ask me to do so) on the wiki page if you think they
> are not relevant anymore.

I just mentioned this elsewhere [1] but let me summarize it here
because I think this could be an interesting thing to do and once you
get attr.c code it's not that hard to do. The student would need to
understand about git attributes and how it's implemented in attr.c.
But that's about it. More background below, but the summary line is
"optimize attribute lookup to be proportional with the number of
attributes queried, not the number of attributes present in
.gitattributes files".

So, we normally look up the same set of attributes over a long list of
paths. We do this by building up an "attribute stack" containing all
attribute info collected from all related .gitattributes files.
Whenever we move from one path to the next, we update the stack
slightly (e.g. if the previous path is a/b/c and the current one is
a/d/e, we need to delete attributes from a/b/.gitattributes from the
stack, then add ones from a/d/.gitattributes). Looking up is just a
matter of going through this stack, find attribute lines that match
the given path, then get the attribute value.

This approach will not scale well. Assume that you have a giant
.gitattrbutes file (or spreading over many files) with a zillion
random attributes and two lines about "love" attribute. When you look
up this "love" attribute you may end up going through all those
attribute lines. [2] hints about a better approach in the comment near
cannot_trust_maybe_real. If you know you are looking for "love", when
you build up the attribute stack, just keep "love" and ignore
everything else [3]. This way, the attribute stack that we need to
lookup will have two lines about "love". Lookup time is of course now
much faster. In the best possible case, when you look for an attribute
that is not defined anywhere in .gitattributes files in your repo, you
get an instant "not found" response because the attribute stack is
empty. This edge case was implemented in [4].

[1] https://public-inbox.org/git/20190118165800.GA9956@xxxxxxxxxxxxxxxxxxxxx/T/#m32fef6a9e8f65dffae41e44a62dd76b4a84fa0fe
[2] 7d42ec547c (attr.c: outline the future plans by heavily commenting
- 2017-01-27)
[3] well, macros make it a bit more complex, but I'll leave that as an exercise.
[4] 06a604e670 (attr: avoid heavy work when we know the specified attr
is not defined - 2014-12-28)
-- 
Duy