Re: [PATCH 0/4] Make unpack_trees() do separate source and destination indexes

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 7 Mar 2008 09:46:16 -0800 (PST)

On Thu, 6 Mar 2008, Junio C Hamano wrote:
> 
> What I have been wondering about this series was if we can re-enable
> cache-tree for more of the unpack_trees() users.
> 
> Currently, all unpack_trees() users, other than a single-tree read-tree,
> invalidates the whole cache-tree information, as Daniel's "pop one, decide
> what to put back, all in the original index" had too many manual index
> manipulations and sprinkling cache_tree_invalidate() call everywhere was
> too much cluttering of already hard-to-follow codepath.

It should work as-is, but not optimally.

The reason I say that is that the result-tree won't have any cache tree 
*at*all* with this series, which is a possible peformance poblem (but 
there are performance upsides to the series too, so I'm not sure it's all 
that noticeable). So the "source" index will keep its cache-tree 
unmodified, because it simply never got modified in the first place.

Of course, all _current_ users will have src_index == dst_index (except 
for the "diff_oneway" thing that doesn't have a destination at all, 
because it doesn't actually want any index manipulation), so for all 
current users this is all pure downside: we throw away the good cache_tree 
for the incoming index, and we aren't even able to use it for the result.

But because I envisioned that the people who really *care* will be the 
ones that want to have different incoming and outgoing indexes, I didn't 
actually worry about it. Ie all the "git status" and "git commit" stuff 
would presumably end up using different indexes, and then they can re-use 
the (unmodified) source, and that would generally be the one that they 
actually want.

But I dunno. Maybe we'd actually want to try to re-use the cache_tree from 
the source some way, if we normally actually want it in the destination.

So this series really isn't a final thing, but I do think that in the form 
I posted it, it's a step forward in (a) readability/maintainability and 
(b) flexibility. And I think all the really *core* stuff got converted, 
but now there's a lot of small details hanging around that just don't take 
advantage of the new capabilities.

I basically tried to keep things bug-for-bug compatible. I also didn't try 
to optimize some things that I know I'll want to eventually.

For example, since we now create a new "index_state" entirely, that means 
that we may need to re-hash the cache entries in the name hash, and that 
means that we now allocate a new one *and*copy*it* rather than just re-use 
cache entries directly.

So that's the reason for the new add_entry() function that actually 
allocates a new cache_entry. And it actually has room for various very 
obvious optimizations that I did *not* do:

 - if the cache_entry comes from a tree (common), just re-use it, since it 
   wasn't hashed in the first place
 - if "o->dst_index" is NULL, just don't do anything, because we'll not 
   use the result anyway.
 - if "o->dst_index == o->src_index" and the cache_entry doesn't have 
   CE_HASHED set, just re-use it directly.

adn things like that. So there were a lot of small micro-optimizations 
that I didn't do, because quite frankly, this conversion was really 
painful in the first place (the set of patches I sent out may have 
_looked_ fairly clean, but that's only because I didn't show you the pain 
it was to create them ;)

		Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html