On 09/27, Junio C Hamano wrote: > Brandon Williams <bmwill@xxxxxxxxxx> writes: > > > - /* Find common prefix for all pathspec's */ > > - max_prefix = common_prefix(&pathspec); > > + /* > > + * Find common prefix for all pathspec's > > + * This is used as a performance optimization which unfortunately cannot > > + * be done when recursing into submodules > > + */ > > + if (recurse_submodules) > > + max_prefix = NULL; > > + else > > + max_prefix = common_prefix(&pathspec); > > max_prefix_len = max_prefix ? strlen(max_prefix) : 0; > > I still wonder if we can do better than this, as this would be a big > cycle-saver especially in recurse-submodules case. > > When you get max_prefix that is "a/b/c", there are three cases: > > * a/b/c is a path prefix for an entry in the index, e.g. a/b/c/d; > you then can safely use it and you do not have to do any > recursive invocation of ls-files outside "a/b/c". You may match > a/b/c/d in the toplevel, or you may recurse a/b/c/e that is a > submodule, but you won't have to pay attention to submodules > outside. > > * a leading path of a/b/c, e.g. a/b, is a gitlink or a blob in the > index; you can use a/b and you only have to recurse into a/b if > that is a submodule; if a/b is a blob, you'd show nothing. > > * a/b/c itself and no leading path of it appears in the index; you > know that nothing will match once you know that you are in this > situation. > > Because a gitlink "a/b" sorts at the same location in the index as a > regular blob "a/b" would, by feeding the max_prefix common_prefix() > gives you (i.e. "a/b/c") to index_name_pos() to see which one of the > three situations you are in can be done fairly cheaply, I would > think. The index_name_pos() call may find "a/b/c" exactly (case 1), > or return a location where "a/b/c" would be inserted in the list of > existing entries. If there were "a/b" (or "a") in the index, there > wouldn't be any "a/b/x" (or "a/x") at the same time, so a query for > "a/b/c" would land you next to (just after) an existing entry that > is a leading path of it, if such an entry exists, no? That would > allow you to tell case 2 above fairly cheaply, I would expect. > > It is a separate issue if adding that support to 4/4 is a good idea; > I personally think doing it as a separate follow-up patch would make > more sense, so all of the above is tangent. I agree that this could be a big cycle saver. At the present I was working towards getting a working implementation but this should definitely be addressed in a follow-up patch to introduce the optimization to the recurse-submodule mode. It hopefully wouldn't be too hard to implement seeing as its using string literals. -- Brandon Williams