Re: [PATCH 4/4] object-name: diagnose trees in index properly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/12/2022 2:32 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
>> From: Derrick Stolee <derrickstolee@xxxxxxxxxx>
>>
>> When running 'git show :<path>' where '<path>' is a directory, then
>> there is a subtle difference between a full checkout and a sparse
>> checkout. The error message from diagnose_invalid_index_path() reports
>> whether the path is on disk or not. The full checkout will have the
>> directory on disk, but the path will not be in the index. The sparse
>> checokut could have the directory not exist, specifically when that
>> directory is outside of the sparse-checkout cone.
>>
>> In the case of a sparse index, we have yet another state: the path can
>> be a sparse directory in the index. In this case, the error message from
>> diagnose_invalid_index_path() would erroneously say "path '<path>' is in
>> the index, but not at stage 0", which is false.
>>
>> Add special casing around sparse directory entries so we get to the
>> correct error message. This requires two checks in order to get parity
>> with the normal sparse-checkout case.
> 
> That all makes sense, but let me ask a more basic (and possibly
> stupid) question and a half.

All of these questions are good thoughts. 

>  - When running 'git cmd :<path>' where '<path>' is a directory,
>    even before the "sparse-index" or "sparse-checkout" world, I
>    sometimes wished that ":<path>" resolved to the object name of
>    the tree recorded in the cache-tree, if we have one.  If we are
>    living in the "sparse-index" world, and the paths underneath
>    '<path>' are not in the index (because we are not interested in
>    them), we should be able answer "git rev-parse :<path>" with the
>    name of the tree object.  It is a "new feature" regardless of
>    sparse-index, but...

I also considered "what if we made this 'just work' for trees in the
index?" The issue I found is that we might need to generate the cache-tree
extension when it does not exist (or is not up-to-date).

For this series, I erred on matching existing behavior before
adding such a feature. It is unfortunate that any work to show
trees this way would essentially revert patches 3 and 4 of the
current series.

>    ...I wonder if it is sensible to add more code to
>    engrave in stone that we would not support it and keep treating
>    the index as if it is a flat list of paths to blobs (and commits,
>    for submodules)?

I don't consider any code change to be "engraved in stone", including
these tests that I am adding. Tests document expected behavior. If we
decide to change that expected behavior, then we can change the tests.
Better to have tests that change when we are making a choice to change
behavior than to not have that evidence of a behavior change at all.

Unfortunately, we did not previously have tests for "git show :dir",
so this is the first time we are documenting the behavior.

>  - When running 'git cmd :<path>' where '<path>' is *not* a
>    directory but is not in the index because of "sparse-index"
>    (i.e. a leading directory of '<path>' is represented as a tree in
>    the index), should ":<path>" index expand the "tree" index entry
>    on-demand, so that <path> has its own entry in the index, as if
>    no "sparse-index" is in effect?

There was one case of test_sparse_match that could have been a
test_all_match: "git show :folder1/a". Since folder1/a is discovered in
the index, Git doesn't check the worktree for the existence of the file to
report any different behavior. Changing the test works without any code
change, so I'll update that in v2.

>    The tests I saw in the series
>    were mostly asserted with test_sparse_match, not test_all_match,
>    so I couldn't tell what the expectations are.

The reason for test_sparse_match instead of test_all_match is because the
error messages change depending on the existence of the path in the
worktree. For example "git show :folder1/" in the test script would have
this output difference:

+ diff -u full-checkout-err sparse-checkout-err
--- full-checkout-err   2022-04-12 13:35:51.430805689 +0000
+++ sparse-checkout-err 2022-04-12 13:35:51.430805689 +0000
@@ -1 +1 @@
-fatal: path 'folder1/' exists on disk, but not in the index
+fatal: path 'folder1/' does not exist (neither on disk nor in the index)

>  - More generally, if <leading> path is represented as a
>    "sparse-dir" entry, should ":<leading>/<lower>" cause the index
>    entry for <leading> path to be expanded on-demand?  I am guessing
>    that the answer is yes, as we wouldn't be able to even tell if
>    such a path <leading>/<lower> would exist in the index if the
>    index were converted to full upfront.

This is what happens, but it happens inside index_name_pos(), so there
isn't any change necessary in the builtin to handle this. Tests such as
"git show :folder1/a" demonstrate this behavior.

Now, there could be a _faster_ way to do this if we did not depend
directly on index_name_pos(). Using index_name_pos() will trigger the
index expansion, which is necessary for some compatibility with the sparse
index.

In this case, we don't really care what _position_ the cache entry is in,
we just want to know if the path exists in the index or not.

For that, we could extract a helper method out of the implementation of
index_name_pos() to be something like index_name_exists(). The common code
between the two implementations is to find a position in the index that
either corresponds to the exact path _or_ is a sparse directory (or the
path definitely doesn't exist in the index). From that point,
index_name_exists() could do a tree walk to find the path inside the tree.

All of this is _possible_, but I also understand that running "git show
:<path>" is not common for users to run directly. Instead, I've seen this
used by IDEs to find the staged version of a file. This is not likely to
be run for paths outside of the sparse-checkout cone.

I think that this is not the only place that could benefit from an
index_name_exists() method, but I'm not ready to focus on improving
performance for queries outside of the sparse-checkout cone.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux