Duy Nguyen <pclouds@xxxxxxxxx> writes: > On Mon, Aug 6, 2018 at 5:48 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: >> > I've also checked about the lookahead thing in unpack_trees() to see >> > if we accidentally break something there, which is my biggest worry. >> > See [1] and [2] for context, but I believe since we can't have D/F >> > conflicts, the situation where lookahead is needed will not occur. So >> > we should be safe. I think you would want the same "switch cache-bottom before descending into a subdirectory, and then restore cache-bottom after traversal comes back" dance that is done for the normal tree traversal case to happen. bottom = switch_cache_bottom(&newinfo); ret = traverse_trees(n, t, &newinfo); restore_cache_bottom(&newinfo, bottom); During your walk of the index and the trees that are known to be in sync, there is little reason to worry about the cache_bottom, which is advanced by calling mark_ce_used() in traverse_by_cache_tree(). Where it matters is what happens after the traversal comes back out of the subtree. find_cache_pos() uses the bottom pointer so that it does not have to go back to far to find an index entry that has not been used to match with the entries from the trees (which are not sorted exactly the same way as the index, unfortunately), so forgetting to advance the bottom pointer while correctly marking a ce as "used" is OK (i.e. hurts performance but not correctness), but advancing the bottom pointer too much and leaving entries that are not used behind is *not* OK. And lack of restoring the bottom in the new codepath makes me suspect exactly such a bug _after_ the traversal exits the subtree we are using this new optimization in and moves on. Imagine we are iterating over the top-level of the trees, and found a subtree in them. There may be some index entries before the first path in this subtree that are not yet marked as "used", the earliest of which is pointed at by the cache_bottom pointer. Before descending into the subtree (and start consuming the entry with the first in this subtree from the index), we stash away the current cache_bottom, and then start walking the subtree. While we are in that subtree, the cache_bottom starts from the first name in the subtree and increments, as we _know_ the entry at the old cache_bottom is outside this subtree and will not match any entry form the subtree. Then when the traversal returns, all index entries within the "subtree/" path will be marked "used". At that point, when we continue to scan the top-level of the trees, we need to restore the cache_bottom, so that we do not forget entries that we knew we needed to scan eventually, if there was any.