Elijah Newren <newren@xxxxxxxxx> writes: > On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: >> >> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> >> While comparing an index to a tree, we may see a sparse directory entry. >> In this case, we should compare that portion of the tree to the tree >> represented by that entry. This could include a new tree which needs to >> be expanded to a full list of added files. It could also include an >> existing tree, in which case all of the changes inside are important to >> describe, including the modifications, additions, and deletions. Note >> that the case where the tree has a path and the index does not remains >> identical to before: the lack of a cache entry is the same with a sparse >> index. >> >> In the case where a tree is modified, we need to expand the tree >> recursively, and start comparing each contained entry as either an >> addition, deletion, or modification. This causes an interesting >> recursion that did not exist before. > > So, I haven't read through this in detail yet...but there's a big > question I'm curious about: > > Git already has code for comparing an index to a tree, a tree to a > tree, or a tree to the working directory, right? So, when comparing a > sparse-index to a tree...can't we re-use the compare a tree to a tree > code when we hit a sparse directory? Offhand I do not think of a reason why that cannot work. The tree-diff machinery takes two trees, walks them in parallel and repeatedly calls either diff_addremove() or diff_change(), which appends diff_filepair() to the diff_queue[] structure. If you see an unexpanded tree on the index side, you should be able to pass that tree with the subtree you are comparing against to the tree-diff machinery to come up with a series of filepairs, and then tweak the pathnames of these filepairs (as such a two-tree comparison would be comparing two trees representing a single subdirectory of two different vintages) before adding them to the diff_queue[] you are collecting the index-vs-tree diff, for example. But if a part of the index is represented as a tree because it is outside the cone of interest, should we even be showing the difference in that part of the tree? If t/ directory is outside the cone of interest, should "git diff HEAD~100 HEAD t/" show anything to begin with (the same question for "git diff --cached HEAD t/")?