Re: [PATCH 1/2] upload-pack: support subtree packing

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> · Wed, 28 Jul 2010 08:29:10 +1000

On Wed, Jul 28, 2010 at 12:46 AM, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote:
> Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> wrote:
>> This patch adds a new capability "subtree", which supports two new
>> requests "subtree" and "commit-subtree".
>>
>> "subtree" asks upload-pack to create a pack that contains only blobs
>> from the given tree prefix (and necessary commits/trees to reach
>> those blobs).
>>
>> "commit-tree" asks upload-pack to create a pack that contains trees of
>> the given prefix (and necessary commits/trees to reach those trees)
>>
>> With "subtree" request, Git client may then rewrite commits to create
>> a valid commit tree again, so that users can work on it independently.
>> When users want to push from such a tree, "commit-tree" may then be
>> used to re-match what users have and what is in upstream, recreate
>> proper push commits.
>
> I disagree with a lot of this... but the idea is quite cool.
>
> I like the "subtree" command, being able to clone down only part of
> the repository is a nice feature, and the implementation of subtree
> seems simple enough for the server.  It only has to emit some of
> the paths, but the entire commit DAG.  This is pretty simple to
> implement server side and is very lightweight.

Another point is server side can disallow full clone completely and
give permission to clone on directory basis. Enterprise users would
love this.

> But I disagree with the client rewriting the commits in order to
> work with them locally.  Doing so means you can't take a commit
> from your team's issue tracker and look it up.  And any commit
> you create can't be pushed back to the server without rewriting.
> Its messy for the end-user to work with.

That's what happens with git-subtree in its current form (I don't know
much about git-subtree though). But I guess if they can use
git-subtree as it is now, they can live with subtree clone+git-subtree
just fine.

> I would prefer doing something more like what we do with shallow
> on the client side.  Record in a magic file the path(s) that we
> did actually obtain.  During fsck, rev-list, or read-tree the
> client skips over any paths that don't match that file's listing.
> Then we can keep the same commit SHA-1s, but we won't complain that
> there are objects missing.

That's another option. With all trees, sparse checkout can be used, as
long as you limit your operations within a subdirectory. Full tree
commands like git-fsck can be taught to realize it's subtree clone and
stop complain of non-existing objects. Download pack would be bigger
(I don't know how much). And it also defeats the enterprise point
above.

> The downside is, a lot of the client code is impacted, and that
> is why nobody has done it yet.  Tools like rebase or cherry-pick
> start to behave funny.  What does it mean to rebase or cherry-pick
> a commit that has deltas outside of the area you don't have cloned?
> It probably should abort and refuse to execute.  But `git show`
> should still work, which implies you need a way to toggle the
> diff code to either skip or fail on deltas outside of the shallow
> path space.

Where do those deltas come from? I thought, with proper path limiting
in upload-pack, pack-objects would never generate anything that needs
things outside the area?

Sounds like git-subtree for short term, and without git-subtree long
term to me :)
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html