Re: [PATCH 15/32] unpack_trees: only unpack $GIT_DIR/narrow subtree in narrow repository

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> · Wed, 25 Aug 2010 15:38:54 +1000

On Wed, Aug 25, 2010 at 3:04 PM, Elijah Newren <newren@xxxxxxxxx> wrote:
> Hi,
>
> 2010/8/24 Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>:
>> By definition, narrow repository is incomplete. It does not even have
>> enough tree for a single commit. So populating a full index is
>> impossible.
>>
>> Because of this, unpack_trees() is modified to only unpack trees
>> within $GIT_DIR/narrow, which narrow repo has all needed trees. This
>> makes the resulting index unsuitable for creating commits later on.
>> This is the reason index version is increased to 4, to avoid older
>> git from using it.
>>
>> The resulting tree objects created from the index is only part of the
>> full tree. Manipulation will be needed at commit time to create proper
>> tree for commits.
>
> I spent a while thinking about this a couple weeks ago and never came
> to a strong conclusion about which of two alternatives should be
> preferred; I'm curious why you decided to go for this solution.  An
> alternative I thought of was having the index have entries for missing
> files (whose contents did not exist in the repository or the working
> copy; rather all we know is the filename and its sha1sum) and also
> gain the ability to have entries for missing trees (which behave
> similarly; all we know is their name and their sha1sum, but the
> contents of that sha1sum are not in the repository or the working
> directory)  Is there a reason to prefer one alternative over the
> other?  Does the alternative I thought of even make sense?

That was nightmare. I had to deal with unpack_callback() and skip
uninterested paths. And that function is, I think, quite optimized. I
tried another approach, putting directories in index with hope that it
would cut down the number of code path I'd have to touch. It did, but
then index sorting order is just weird and I couldn't get it right
(felt to risky).

unpack_trees() is used by diff (all diffs except diff_tree_sha1).
traverse_trees()/unpack_trees() is also used by merge strategy. All
those code is really complicated that I'd rather stay away from them.

That was before I went with tree rewrites. Soon after I realized the
nice effect of tree rewrites is that the index is narrowed down. So I
can get rid of tree rewrites as long as the index is still narrow.
That's how I come to this approach.

Back to your questions. I think the alternative makes sense, it just
looks a lot of work and quite intrusive. On the other hand, my current
approach is quite simple (but it probably won't work for more than a
single narrow tree)
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html