Re: [PATCH 05/16] Hook up replace-object to allow bulk commit replacement

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 02 Aug 2010 12:58:23 -0700

I really do not like the use of "replace" for the purpose of narrow
clones.  While "replace" is about fixing a mistake by tweaking trees, a
desire to have a narrow clone at this moment is _not_ a mistake.  You may
want to have wider or full clone of the project tomorrow.  You may want to
push the result of committing on top of such a narrowed clone back to a
full repository.  My gut feeling is that that use of "replace" to stub out
the objects that you do not currently have would make it a nightmare when
you would want to widen (especially to widen over the wire while pushing
into a full repository on the other end), although I haven't looked at all
the patches in the series.

Can you back up a bit and give us a high-level overview of how various
operations in a narrowed clone should work, and how you achieve that
design goal?

Let's take an example of starting from git.git and narrow-clone only its
Documentation/ (as you seem to have used as a guinea-pig) subdirectory.
For the sake of simplicity, let's say the upstream project has only one
commit.

One plausible approach would be to have the commit, its top level tree
object, its Documentation/ tree object and all the blobs below that level,
while other blobs and trees that are reachable from the top level tree
object are left missing, but somehow are marked so that fsck would think
they are OK to be missing.  Your worktree would obviously be narrowed to
the same Documentation/ area, and unlike the narrow checkout codepath, you
do not widen on demand (unless you automatically fetch missing parts of
the tree, which I do not think you should do by default to help people who
work while at 30,000ft).  Instead, any operation that tries to modify
outside the "subtree" area should fail.

When you build a commit that represents a Documentation patch on top of
such a narrowed clone, because you have a full tree of Documentation/
area, you can come up with the updated tree object for that part of the
project.  If "subtree" mode (aka narrowed clone) rejects operation outside
the cloned area, your commit is guaranteed to touch only Documentation/
area and nothing outside.  You therefore should be able to compute the
tree object for the whole repository (i.e. all the other entries in the
top level tree object should be the same as those from HEAD).

Because the index is a flat structure, you would need to fudge the entries
that are missing-but-OK in there somehow, _and_ you would need to be able
to recompute the tree after updating Documentation/ area.  E.g. you may
know ppc/ is tree db31c066 but may not know that it has three blobs
underneath it nor what their object names are, so your index operating in
this mode would need to record (ppc -> db31c066) mapping in order to be
able to recreate the tree object out of it.

Using cache-tree data structure might help in doing this.  It so far has
been an optimization (i.e. when it says it has an up-to-date information,
it does, but if it doesn't you can always recompute what is needed from
the flat index entries), but I would imagine that you can add an "out of
cloned area" bit to cache-tree entries, and mark a subtree that represents
missing parts (e.g. 'ppc/') as such---anything that tries to invalidate
such a cache-tree entry would be an error anyway, and when you need to
write the index out as a tree, such cache-tree entries that record the
trees outside your cloned area can be reused, no?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html