On Fri, Oct 10, 2008 at 3:48 PM, Paul Mackerras <paulus@xxxxxxxxx> wrote: > Alexander Gavrilov writes: > >> When the diff contains thousands of files, calling git-check-attr >> once per file is very slow. With this patch gitk does attribute >> lookup in batches of 30 files while reading the diff file list, >> which leads to a very noticeable speedup. > > Why only 30 at a time? The logic would be simpler if cache_gitattr > just did all the paths in one call to git check-attr, and it should be > able to cope with 1000 paths in one call, I would think, which is the > most that gettreediffline will give it. OS-enforced command-line size limit on Windows is 32K. Cramming in 1000 paths would leave only 32 characters for each path. > Also, I wonder why we now have two levels of caching of the encoding > attribute. Your patch 1/4 introduced path_encoding_cache, which was > fine, but now we have path_attr_cache as well, which seems to me to > serve exactly the same function since the encoding is the only > attribute we ever ask about. Surely we don't need both caches? If the (git-gui) patch that reimplements the tcl_encoding procedure is applied, we may drop the path_encoding_cache. Current implementation is too slow for batch lookup, especially if the encoding is actually not supported, and without the cache the lookup would be done on every loading of a diff. > Even with this batching, I am a bit concerned that adding the encoding > support might make things noticeably slower for people who don't need > any encoding support (which would be the majority, I think). We may > end up needing an option to turn off the checking of the encoding > attribute. I hope that most diffs don't contain thousands of files at once. And actual huge diffs are likely to be relatively slow to load anyway. Alexander -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html