Re: Questions about partial clone with '--filter=tree:0'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21.10.2020 0:29, Taylor Blau wrote:
Oops. That can happen sometimes, but thanks for re-sending. I'll try to
answer the basic points below.

Thanks for stepping in!

(1) Is it even considered a realistic use case?
-----------------------------------------------
Summary: is '--filter=tree:0' a realistic or "crazy" scenario that is
not considered worthy of supporting?

It's not an unrealistic scenario, but it might be for what you're trying
to build. If your UI needs to run, say, 'git log --patch' to show a
historical revision, then you're going to need to fault in a lot of
missing objects.

If that's not something that you need to do often or ever, then having
'--filter=tree:0' is a good way to get the least amount of data possible
when using a partial clone. But if you're going to be performing
operations that need those missing objects, you're probably better eat
the network/storage cost of it all at once, rather than making the user
wait for Git to fault in the set of missing objects that it happens to
need.

We currently do not intend to use '--filter=tree:0' ourself, but we are trying to support all kinds of user repositories with our UI. So we basically have these choices:

A) Declare '--filter=tree:0' repos as completely wrong and unsupported
   in out UI, also giving an option to "un-partial" them.

B) Support '--filter=tree:0' repos, but don't support operations such
   as blame and file log

C) Use some magic to efficiently download objects that will be needed
   for a command such as Blame, while keeping the rest of the repository
   partial. This is where the command described in (3) will help a lot.

We would of course prefer (C) if it's reasonably possible.

(2) A command to enrich repo with trees
---------------------------------------
There is no good way to "un-partial" repository that was cloned with
'--filter=tree:0' to have all trees, but no blobs.

There is no command to do that directly, but it is something that Git is
capable of.

It would look something like:

   $ git config remote.origin.partialclonefilter 'blob:none'

Now your repository is in a state where it has no blobs or trees, but
the filter does not prohibit it from getting the trees, so you can ask
it to grab everything you're missing with:

   $ git fetch origin

This should even be a pretty fast operation for repositories that have
bitmaps due to some topics that Peff and I sent to the list a while ago.
If it isn't, please let me know.

Unfortunately this does not work as expected. Try the following steps:

A) Clone repo with '--filter=tree:0'
$ git clone --bare --filter=tree:0 --branch master https://github.com/git/git.git

B) Change filter to 'blob:none'
   $ cd git.git
   $ git config remote.origin.partialclonefilter 'blob:none'

C) fetch
   $ git fetch origin
   Note that there is no 'Receiving objects:' output.

D) Verify that trees were downloaded
   $ git cat-file -p HEAD | grep tree
     tree ee5b5b41305cda618862beebc9c94859ae276e5a
   $ git cat-file -t ee5b5b41305cda618862beebc9c94859ae276e5a
     Note that 1 object gets downloaded. This confirms that (C) didn't
     achieve the goal.

It happens due to 'check_exist_and_connected()' test in 'fetch_refs()'.
Since the tip of the ref is already available locally (even though it
is missing all trees), nothing is downloaded.

There seems to be a dirty way of doing that by abusing 'fetch --deepen'
which happens to skip "ref tip already present locally" check, but
it will also re-download all commits, which means extra ~0.5gb network
in case of Linux repo.

Mmm, this is probably not what you're looking for. You may be confusing
shallow clones (of which --deepen is relevant) with partial clones
(to which --deepen is irrelevant).

Yes, '--deepen' is intended for shallow clones. But abusing it for
partial clones allows to skip 'check_exist_and_connected()' test.
However, I did more testing today, and in many cases server itself
refuses to send objects, probably due to sent 'HAVE' or something
else. So even '--deepen' doesn't really help.

I think what you probably want is a step 1.5 to tell Git "I'm not going
to ask for or care about the entirety of my working copy, I really just
want objects in path...", and you can do that with sparse checkouts. See
https://git-scm.com/docs/git-sparse-checkout for more.

For simplicity of discussion, let's focus on the problem of running
Blame efficiently in a repo that was cloned with '--filter=tree:0'. In
order to blame file '/1/2/Foo.txt', we will need the following:

* Trees '/1'
* Trees '/1/2'
* Blobs '/1/2/Foo.txt'

All of these will be needed to unknown commit depth. For simplicity,
the proposed command will download these for all commits. Specifying
a range of revisions could be nice, but I feel that it's not worth the
complexity.

Correct me if I'm wrong: I think that sparse checkout will not help to
achieve the goal?

This is why I suggest a command that will accept paths and send
requested objects, also forcing server to assume that all of them are
missing in client's repository.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux