Re: [PATCH v2 00/16] First class shallow clone

"Philip Oakley" <philipoakley@xxxxxxx> · Tue, 23 Jul 2013 23:33:22 +0100

From: "Duy Nguyen" <pclouds@xxxxxxxxx>
Sent: Tuesday, July 23, 2013 2:20 AM
On Tue, Jul 23, 2013 at 6:41 AM, Philip Oakley <philipoakley@xxxxxxx> 
wrote:
From: "Nguyễn Thái Ngọc Duy" <pclouds@xxxxxxxxx>
Subject: [PATCH v2 00/16] First class shallow clone

It's nice to see that shallow can be a first class clone.

Thinking outside the box, does this infrastructure offer the 
opportunity to
maybe add a date based depth option that would establish the shallow
watermark based on date rather than count. (e.g. the "deepen" SP 
depth could

I've been carefully avoiding the deepen issues because, as you see,
it's complicated. But no, this series does not enable or disable new
deeepen mechanisms. They can always be added as protocol extensions.
Still thinking if it's worth exposing a (restricted form of) rev-list
to the protocol..

Interesting idea.

have an alternate with a leading 'T' to indicate a time limit ratherv 
than
revision count - I'm expecting such a format would be an error for 
existing
servers).

My other thought was this style of cut limit list may also allow a 
big file
limit to do a similar process of listing objects (e.g. blobs) that 
are
size-shallow in the repo, though it maybe a long list on some repos, 
or with
a small size limit.

This one, on the other hand, changes the "shape" of the repo (now with
holes) and might need to go through the same process we do with this
series. Maybe we should prepare for it now. Do you have a use case for
size-based filtering? What can we do with a repo with some arbitrary
blobs missing? Another form of this is narrow clone, where we cut by
paths, not by blob size. Narrow clone sounds more useful to me because
it's easier to control what we leave out.

In some sense a project with a sub-module is a narrow clone, split at a 
'commit' object. There have been comments on the git-user list about the 
problem of accidental adding of large files which then make the repo's 
foot print pretty large as one use case [Git is consuming very much 
RAM]. The bigFileThreshold being one way of spotting such files as 
separate objects, and 'trimming' them.

It doesn't feel right to 'track files and directories` as paths for 
doing a narrow clone - it'd probably fall into the same trap as tracking 
file renames. However if one tracks trees and blobs (as a list of sha1 
values, possibly with their source path) then it should it should be 
possible to allow work on the repo with those empty directories/files in 
the same manner as is used for sub-modules, possibly with some form of 
git-link file as an alternate marker.

The thought process is to map sub-module working onto the other object 
types (blobs and trees). The user would be unable to edit the trimmed 
files/directories anyway, so its sha1 value can't change, allowing it to 
be included in the next commit in the branch series.

Philip

--
Duy
--

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html