From: "Jonathan Nieder" <jrnieder@xxxxxxxxx>
Sent: Monday, July 17, 2017 7:03 PM
Hi Philip,
Philip Oakley wrote:
From: "Jonathan Tan" <jonathantanmy@xxxxxxxxxx>
These patches are part of a set of patches implementing partial clone,
as you can see here:
https://github.com/jonathantanmy/git/tree/partialclone
[...]
If I understand correctly, this method doesn't give any direct user
visibility of missing blobs in the file system. Is that correct?
I was hoping that eventually the various 'on demand' approaches
would still allow users to continue to work as they go off-line such
that they can see directly (in the FS) where the missing blobs (and
trees) are located, so that they can continue to commit new work on
existing files.
I had felt that some sort of 'gitlink' should be present (huma
readable) as a place holder for the missing blob/tree. e.g.
'gitblob: 1234abcd' (showing the missing oid, jsut like sub-modules
can do - it's no different really.
That's a reasonable thing to want, but it's a little different from
the use cases that partial clone work so far has aimed to support.
They are:
A. Avoiding downloading all blobs (and likely trees as well) that are
not needed in the current operation (e.g. checkout). This blends
well with the sparse checkout feature, which allows the current
checkout to be fairly small in a large repository.
True. In my case I was looking for a method that would allow a 'Narrow
clone' such that the local repo would be smaller (have less content), but
would feel as if all the usefull files/directories were available, and there
would be place holders at the points where the trees were pruned, both in
the object store, and in the user's work-tree.
As you say, in some ways its conceptually orthogonal to the original sparse
checket (which has a full width object store / repo, and then omitted files
from the checkout.
GVFS uses a trick that makes it a little easier to widen a sparse
checkout upon access of a directory. But the same building blocks
should work fine with a sparse checkout that has been set up
explicitly.
B. Avoiding downloading large blobs, except for those needed in the
current operation (e.g. checkout).
When not using sparse checkout, the main benefit out of the box is
avoiding downloading *historical versions* of large blobs.
It sounds like you are looking for a sort of placeholder outside the
sparse checkout area.
True.
In a way, that's orthogonal to these patches:
even if you have all relevant blobs, you may want to avoid inflating
them to check them out and reading them to compare to the index (i.e.
the usual benefits of sparse checkout).
In my concept, it should be possible to create the ('sparse'/narrow) index
from the content of the local object store, without any network connection
(though that content is determined by the prior fetch/clone;-). The proper
git sparse checkout could proceed from there as being a further local
restriction on what is omitted from the worktree.
Those missing from the narrow clone would still show as place holders with
content ".gitnarrowtree 13a24b..<oid>" (so we know what the hash oid of the
file/tree should be (so they can be moved/renamed etc!). The index would
only know the content/structure as far as the place holders (just like
sub-modules are a break point in the tracking, with identical caveats)
It would be interesting to know from Ben the level of sparseness/narrowness
has been seen typically in the BigWin GVFS repo case.
In a sparse checkout, you
still might like to be able to get a listing of files outside the
sparse area (which you can get with "git ls-tree") and you may even
want to be able to get such a listing with plain "ls" (as with your
proposal).
Thanks and hope that helps,
Jonathan
Thanks, yes. It has help consolidate some of the parts of my concept that
has been in the back of my mind for a while now.
Philip