Re: [PATCH v2] cache-tree: invalidate i-t-a paths after generating trees

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 10 Dec 2012 09:22:40 -0800

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:

> Yeah I use entry_count for two different things here. In the previous
> (unsent) version of the patch I had "entry_count = -1" right after "i
> -= entry_count"
>
>>> +                     if (sub->cache_tree->entry_count < 0) {
>>> +                             i -= sub->cache_tree->entry_count;
>>> +                             sub->cache_tree->entry_count = -1;
>>> +                             to_invalidate = 1;
>>> +                     }
>
>
> which makes it clearer that the use of negative entry count is only
> valid within update_one. Then I changed my mind because it says
> 'negative means "invalid"' in cache-tree.h.

But you make it to mean not just 'negative means invalid' but
'negate it and you can learn how many index entries to skip to reach
the entry outside the subtree'.  That is what I was wondering about.

I do not think that new invariant does not hold in general; it may
hold true while inside this function, but immediately after somebody
else calls cache_tree_invalidate_path() it won't be true anymore.
Leaking the information to outside callers that can easily be
mistaken as if it may mean something without any comment looks like
we are asking for trouble.

> So, put "entry_count = -1"
> back or just add another field  to struct cache_tree for this?

As long as it is made clear with in-code comment that says "negative
entry count, when negated, will give us how many index entries are
covered by this invalid cache tree, only inside this function", and
such a negative entry_count is reset to -1 to avoid leaking out the
value that will soon go stale to the outside world in the "write
this tree out" loop, I think the v2 patch is OK, if not ideal.

If we were to commit to keep the new invariant true throughout the
system, on the other hand, I suspect that a boolean flag "valid" may
help people who keep i-t-a entries around across write-tree calls.
The first if () statement in the update_one() function can check the
validity, and you can say the cache tree is valid even if the
section in the index that is covered by the cache-tree contains
i-t-a entries _after_ you wrote it out as a tree, no?  That may make
the semantics a lot cleaner, I suspect.

The new semantics might be:

 * update_one() returns negative only when the section of the index
   given to it cannot be written out as a tree (i.e. not merged, or
   corrupt index);

 * update_one() returns the number of entries in the index covered
   by the tree, including i-t-a entries (but not REMOVED, as these
   entries will not be in the index after the cache-tree has been
   written out);

 * cache_tree->valid tells if the cache_tree->sha1 is valid; the
   section of the index the tree covers may or may not have i-t-a
   entries in it;

 * cache_tree->entry_count holds the number of index entries the
   tree covers, couting i-t-a entries. This field is only meaningful
   when cache_tree->valid is true.

or something like that.

I noticed that the recent "ignore i-t-a only while writing trees
instead of erroring out" broke 331fcb5 (git add --intent-to-add: do
not let an empty blob be committed by accident, 2008-11-28) in
another way, by the way.  In verify_cache(), there is an unreachable
else clause to warn about "not added yet" entries that should have
been removed but is left behind.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html