Re: [RFC PATCH 05/15] read_tree_recursive: Avoid missing blobs and trees in a sparse repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Sep 4, 2010 at 9:16 PM, Elijah Newren <newren@xxxxxxxxx> wrote:
> On Sat, Sep 4, 2010 at 8:00 PM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
>> On Sun, Sep 5, 2010 at 10:13 AM, Elijah Newren <newren@xxxxxxxxx> wrote:
>>> @@ -119,6 +119,11 @@ int read_tree_recursive(struct tree *tree,
>>>                default:
>>>                        return -1;
>>>                }
>>> +
>>> +               if (git_sparse_pathspecs &&
>>> +                   sha1_object_info(entry.sha1, NULL) <= 0)
>>> +                       continue;
>>> +
>>
>> I suppose this is temporary and the final solution would have stricter
>> checks (i.e. only ignore if those entries are outside sparse zone)?
>> This opens a door for broken repo.
>
> Yes, good catch.  Looks like I somehow missed that one, but I agree,
> there should be an "&& !tree_entry_interesting(...)" in there.

Sorry, now I remember why that isn't in there and can't be.  I did
have it there at one point. However, base & baselen in
read_tree_recursive do not necessarily correspond to the relative path
from the toplevel of the repository, though the sparse limit pathspecs
always will.  For example, running
  git ls-tree master:Documentation/technical
or, equivalently (for current git.git)
  git ls-tree da0ae7c59bb0df4c13554fd840e1a563cde659ea
then base will be "" for paths under Documentation/technical rather
than "Documenation/technical/".  And there's really no way of
determining the "real base" either in order to apply matching to the
sparse limit pathspecs (well...I guess you could do an exhaustive walk
of all history each time, so long as the given sha1sum only appears as
one particular directory, but that's unrealistic and slow and leaves
open what to do when multiple directories at different paths happen to
have the same sha1sum at some point(s) in their history).  Note that
this affects cat-file -p as well, since it calls ls-tree and thus this
code.

I really don't see how one can change this.  However, if it's any
consolation, sha1_object_info() will print out a warning message
whenever it's asked for a sha1sum that cannot be found in the object
store.  For example, in a sparse clone:

$ git ls-tree -rt master
040000 tree f98bf35e9a746ebbd5a706591fe1ea4942942bad    sub
040000 tree 436913a91c5648a6ed8fa23719fbd6e3052e2597    sub/a
error: unable to find 436913a91c5648a6ed8fa23719fbd6e3052e2597
040000 tree 07753f428765ac1afe2020b24e40785869bd4a85    sub/b
100644 blob d95f3ad14dee633a758d2e331151e950dd13e4ed    sub/b/file
040000 tree 07753f428765ac1afe2020b24e40785869bd4a85    sub/bcopy
100644 blob d95f3ad14dee633a758d2e331151e950dd13e4ed    sub/bcopy/file
100644 blob 4b7b65e07a8641bcd14375ebddf5c8a7fc002a30    sub/file

It can't traverse into sub/a because it doesn't have the necessary
information.  I figured the warning message was useful to the end user
as a reminder that they are in a sparse clone. (Such warning messages
don't affect cat-file -p, because its callbacks don't return
READ_TREE_RECURSIVE, making sure it doesn't trigger this part of the
code.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]