On Wed, Jan 25, 2023 at 8:16 AM William Sprent <williams@xxxxxxxxxxx> wrote: > > On 25/01/2023 06.11, Elijah Newren wrote: > > It looks like Ævar and Victoria have both given really good reviews > > already, but I think I spotted some additional things to comment on. > > > > On Mon, Jan 23, 2023 at 3:46 AM William Sprent via GitGitGadget > > <gitgitgadget@xxxxxxxxx> wrote: > >> > >> From: William Sprent <williams@xxxxxxxxxxx> > >> > >> There is currently no way to ask git the question "which files would be > >> part of a sparse checkout of commit X with sparse checkout patterns Y". > >> One use-case would be that tooling may want know whether sparse checkouts > >> of two commits contain the same content even if the full trees differ. > > > > Could you say more about this usecase? Why does tooling need or want > > to know this; won't a checkout of the new commit end up being quick > > and simple? (I'm not saying your usecase is bad, just curious that it > > had never occurred to me, and I'm afraid I'm still not sure what your > > purpose might be.) > > > > I'm thinking mainly about a monorepo context where there are a number of > distinct 'units' that can be described with sparse checkout patterns. > And perhaps there's some tooling that only wants to perform an action if > the content of a 'unit' changes. So, you're basically wanting to do something like git ls-tree --paths-matching-sparsity-file=<pattern-file> $COMMIT1 >sparse-files git ls-tree --paths-matching-sparsity-file=<pattern-file> $COMMIT2 >>sparse-files sort sparse-files | uniq >relevant-files git diff --name-only $COMMIT1 $COMMIT2 >changed-files and then checking whether relevant-files and changed-files have a non-empty intersection? Would that potentially be better handled by git diff --name-only $COMMIT1 $COMMIT2 | git check-ignore --ignore-file=<pattern-file> --stdin and seeing whether the output is non-empty? We'd have to add an "--ignore-file" option to check-ignore to override reading of .gitignore files and such, and it'd be slightly confusing because the documentation talks about "ignored" files rather than "selected" files, but that's a confusion point that has been with us ever since the gitignore mechanism was repurposed for sparse checkouts. Or maybe we could just add a check-sparsity helper, and then allow it to take directories in-lieu of patterns. This seems nicer than opening a can of worms about letting every git command specify a different set of sparsity rules. > Depending on the repo, it won't necessarily be quick to check out the > commit with the given patterns. However, it is more about it being > inconvenient to have to have a working directory, especially so if you > want use the tooling in some kind of service or query rapidly about > different revisions/patterns. > > >> Another interesting use-case would be for tooling to use in conjunction > >> with 'git update-index --index-info'. > > > > Sorry, I'm not following. Could you expound on this a bit? > > > > I was imagining something along the lines of being able to generate new > tree objects based on what matches the given sparse checkout patterns. > Not that I have a specific use case for it right now. > > I think what I'm trying to evoke with that paragraph is that this > enables integrations with git that seem interesting and weren't possible > before. I'm not sure if it's interesting, frightening, or something else. Hard to say without better descriptions of usecases, which we can't have if we don't even have a usecase. I think I'd just strike this paragraph. [...] > >> + (*d)->pl.use_cone_patterns = core_sparse_checkout_cone; > > > > Hmm, so the behavior still depends upon the current sparse-checkout > > (or lack thereof), despite the documentation and rationale of your > > feature as being there to check how a different sparse checkout would > > behave? > > > > I would hate to unconditionally turn cone_patterns off, since that > > would come with a huge performance penalty for the biggest repos. But > > turning it unconditionally on wouldn't be good for the non-cone users. > > This probably suggests we need something like another flag, or perhaps > > separate flags for each mode. Separate flags might provide the > > benefit of allowing cone mode users to specify directories rather than > > patterns, which would make it much easier for them to use. > > > I used 'core_sparse_checkout_cone' because I wanted to allow for the > cone mode optimisations, but I also figured that I should respect the > configuration. It doesn't change how the patterns are parsed in this case. > > I agree that it is a bit awkward to have to "translate" the directories > into patterns when wanting to use cone mode. I can try adding > '--[no]-cone' flags and see how that feels. Together with Victoria's > suggestions that would result in having the following flags: > > * --scope=(sparse|all) > * --sparse-patterns-file=<path> > * --[no]-cone: used together with --sparse-patterns-file to tell git > whether to interpret the patterns given as directories (cone) or > patterns (no-cone). > > Which seems like a lot at first glance. But it allows for passing > directories instead of patterns for cone mode, and is similar to the > behaviour of 'sparse-checkout set'. > > Does that seem like something that would make sense? --sparse-patterns-file still implies patterns; I think that would need some rewording. More importantly, though, based on your usecase description, I wonder if you might be better served by either extending the check-ignore subcommand or adding a similar helper ("check-sparsity"?), rather than tweaking ls-tree.