Re: [PATCH 2/2] ls-files: add --sparse option

Elijah Newren <newren@xxxxxxxxx> · Wed, 8 Dec 2021 09:04:18 -0800

On Wed, Dec 8, 2021 at 7:14 AM Derrick Stolee <stolee@xxxxxxxxx> wrote:
>
> On 11/22/2021 9:07 PM, Ævar Arnfjörð Bjarmason wrote:
> >
> > On Tue, Nov 16 2021, Derrick Stolee via GitGitGadget wrote:
>
> Things in the dependent topics are starting to simmer down, so I'm
> back revisiting this topic.
>
> >> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> >> [...]
> >> +test_expect_success 'ls-files' '
> >> +    init_repos &&
> >> +
> >> +    # Behavior agrees by default. Sparse index is expanded.
> >> +    test_all_match git ls-files &&
> >> +
> >> +    # With --sparse, the sparse index data changes behavior.
> >> +    git -C sparse-index ls-files --sparse >sparse-index-out &&
> >> +    grep "^folder1/\$" sparse-index-out &&
> >> +    grep "^folder2/\$" sparse-index-out &&
> >> +
> >> +    # With --sparse and no sparse index, nothing changes.
> >> +    git -C sparse-checkout ls-files --sparse >sparse-checkout-out &&
> >> +    grep "^folder1/0/0/0\$" sparse-checkout-out &&
> >> +    ! grep "/\$" sparse-checkout-out &&
> >
> > I think all of this would be much clearer both in terms of explaining
> > this change, and also for future test relability if it did away with the
> > selective grepping, and simply ran tls-files with and without --sparse,
> > and then test_cmp'd the full output (after munging away the OIDs).
> >
> > I.e. the sort of output that's in my just-sent reply to the CL:
> > https://lore.kernel.org/git/211123.86lf1fwrq5.gmgdl@xxxxxxxxxxxxxxxxxxx/
> >
> > We really don't need to optimize for lines of tests added, and having
> > ~30 lines of plainly understood diff output is IMO preferrable to even 5
> > lines of tricky positive & negative grep invocations that take some time
> > to reason about and understand.
> >
> > I.e. something like:
> >
> >     cat >expected <<-\EOF &&
> >      100644 blob OID   e
> >      100644 blob OID   folder1-
> >      100644 blob OID   folder1.x
> >     -040000 tree OID   folder1/
> >     +100644 blob OID   folder1/0/0/0
> >     +100644 blob OID   folder1/0/1
> >     +100644 blob OID   folder1/a
> >      100644 blob OID   folder10
> >     -040000 tree OID   folder2/
> >     +100644 blob OID   folder2/0/0/0
> >     +100644 blob OID   folder2/0/1
> >     +100644 blob OID   folder2/a
> >      100644 blob OID   g
> >     -040000 tree OID   x/
> >     +100644 blob OID   x/a
> >      100644 blob OID   z
> >     EOF
> >     git [...] ls-files --sparse >actual.raw &&
> >     [munge away OIDs] <actual.raw >actual &&
> >     test_cmp expected actual
> >
> > Would test everything you're trying to test here and more (would need 2x
> > of those..), and would be easier to read & understand.

The loss of checking for trees would be bad; the point of testing a
sparse-index is that it has tree objects in it.  However, the basic
suggestion inspired Stolee's variant below that does check for trees.
So...

> I don't think it is that hard to understand "I expect to see these
> lines and not these lines" but I am open to more fully verifying
> the full output and demonstrating the change that happens when the
> flag is added.
>
> Taking your idea and applying it to 'ls-files' (without --stage to
> avoid OIDs which would change depending on the hash algorithm), the
> start of the test looks like this:
>
> test_expect_success 'ls-files' '
>         init_repos &&
>
>         # Behavior agrees by default. Sparse index is expanded.
>         test_all_match git ls-files &&
>
>         # With --sparse, the sparse index data changes behavior.
>         git -C sparse-index ls-files --stage >out &&
>         git -C sparse-index ls-files --stage --sparse >sparse &&
>
>         cat >expect <<-\EOF &&
>          e
>          folder1-
>          folder1.x
>         -folder1/0/0/0
>         -folder1/0/1
>         -folder1/a
>         +folder1/
>          folder10
>         -folder2/0/0/0
>         -folder2/0/1
>         -folder2/a
>         +folder2/
>          g
>         -x/a
>         +x/
>          z
>         EOF
>
>         diff -u out sparse | tail -n 16 >actual &&
>         test_cmp expect actual
> '

This actually looks quite nice, though the magic '16' is kind of
annoying.  Could we get rid of that -- perhaps using something to rip
out the diff header, or using comm instead?

Also, perhaps 'dense' rather than 'out'?