Torsten Bögershausen <tboegi@xxxxxx> writes: >> There's less redundant work going on than at first seems, because >> .gitattribute files are only read once and cached. Verified by strace. >> > OK, I think I missed that work (not enough time for Git at the moment) > Junio, please help me out, do we have a cache here now? The call convert_attrs() makes to the attribute subsystem is: git_check_attr(path, NUM_CONV_ATTRS, ccheck) Conceptually, this call has to do the following, and for the very first call that is actually what it does: 1. Read all the relevant attrubutes files. If you are asking for "a/x/1", we need to read $GIT_DIR/info/attributes, ".gitattributes", "a/.gitattributes" and "a/x/.gitattributes". 2. Find matching patterns that cover "a/x/1", and pick up the attribute definition from the above. If you have asked for "a/x/1", it is very likely that you would next ask for "a/x/2" (think: "git checkout a/x"), and we can realize that exactly the same set of attributes files apply to that path. So an obvious optimization is to cache the result of the first step. In addition, it is also likely that you would later ask for "a/y/3" before asking for "b/z/4" (think: "git add ."). A part of the step 1. that was done when you asked for "a/x/1" and then was reused when you asked for "a/x/2" can further be reused for this request, by discarding only what was read from "a/x/.gitattributes" and reading only from "a/y/.gitattributes". The above two optimizations are done in prepare_attr_stack(). Unfortunately this is one of the three reasons why the attribute subsystem is not thread-ready. I.e. there is only one global cache, so if you spawn two threads to speed up "git add ." by handing paths [a-m]* to one and [n-z]* to the other, they would thrash the cache and making it ineffective (even if we protect the cache with mutex, which obviously has not been done). I earlier started looking into this, but the effort stalled. But for a single-thread use, the attributes read from the filesystem are cached, and the cache is designed to perform well as long as you make requests in-order. To make the attribute look-up thread-ready, the attribute cache needs to become per-thread. Orthogonal of the threading issue, there is another posssible optimization that is not there yet. The cache can be tied to what is in ccheck[] to further reduce the size of the cache, making step 2. a lot cheaper. Currently in step 1. we read and keep everything, but if we tie the cache to the contents of ccheck[], we can read and ignore entries we read in step 1. that does not talk about the attributes ccheck[] is interested in. My plan is to either (1) make the cache per-thread, limit the reading done in 1. to ccheck[], but invalidate the cache when a different set of attributes are asked; or (2) make the cache per <thread, ccheck[]>. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html