I was looking at this loop in traverse_trees() from tree-walk.c for (i = 0; i < n; i++) { if (!t[i].size) continue; entry_extract(t+i, entry+i); if (last >= 0) { int cmp = entry_compare(entry+i, entry+last); /* * Is the new name bigger than the old one? * Ignore it */ if (cmp > 0) continue; /* * Is the new name smaller than the old one? * Ignore all old ones */ if (cmp < 0) mask = 0; } mask |= 1ul << i; if (S_ISDIR(entry[i].mode)) dirmask |= 1ul << i; last = i; } The logic to update last breaks down while merging these three trees: Tree #1: tree "t" Tree #2: blob "t" blob "t-f" Tree #3: blob "t-f" tree "t" When looking at Tree #3, "last" points at "t" (blob) from Tree #2 and we decide "t-f" comes later than that to ignore it because "cmp > 0". We instead somehow need to look ahead and find "t", while remembering to revisit "t-f" from it in the later rounds (of an outer loop that has this loop inside). Also the second comparison to update last breaks down while merging these: Tree #1: blob "t-f" tree "t" Tree #2: blob "t" Tree #3: blob "t" blob "t-f" When looking at Tree #2, "last" points at "t-f" (blob) from Tree #1 and we decide to use "t" because it sorts earlier. We will match "t" from Tree #3 but we somehow need to go back to Tree #1 and look beyond "t-f" to find the matchig "t" from there as well, while remembering that we need to revisit "t-f" from it in the later rounds.. I am thinking about changing the function this way. (0) take an optional "candidate for the earliest name" from the caller, in a new member in traverse_info structure. unpack_callback() would give the path of the index entry it is looking at after stripping the leading paths as necessary. (1) find the "earliest name" N among the given trees (and the optional additional candidate from the above). The "earliest" is computed by comparing names as if everything were a blob; (2) for each tree t[i], if the current entry entry[i] compares differently with N between the cases where the N were a blob and where N were a tree, and if entry[i] is a blob, a tree with the name N might be hidden behind it. remember the current position so we can come back, and scan forward to find such tree (this does not have to run to the end of the tree), and if we do find one, then instead of saying "we do not have an entry with that name", use that tree in entry[i]. This will collect entries with name N from all trees in the entry[] array; (3) make the callback as usual; unpack_callback(), after processing this round with entries with name N, may or may not advance o->pos but if it did so, it would also update the "candidate for the earliest name" field passed back in the traverse_info structure to prepare for the next round. (4) traverse_trees() will go back to step (0). The tricky part is step (1) and the latter half of step (2) to skip, rewind, and avoid re-processing what was processed already after rewinding, and the progress is much slower than I would like. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html