On Wed, Apr 30, 2014 at 12:23 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Duy Nguyen <pclouds@xxxxxxxxx> writes: > >>> I do think it is sensible to keep two arrays of "struct cache_entry" >>> around (one for base and one for incremental changes) inside >>> index_state, and the patch seems to do so via "struct split_index" >>> that does have a copy of saved_cache. If the write-out codepath >>> walks these two sorted arrays in parallel, shouldn't it be able to >>> figure out which entry is added, deleted and modified without >>> fattening this structure? >> >> So far without that "index" field I would have to resort to hasing >> entries in both arrays to find the shared paths. But ideas are >> welcome. > > Hmm, why do you need to hash, when both arrays are sorted? Wouldn't > it be just the matter of walking these two arrays in parallel, > with one scanning index for each array initialized to the beginning, > comparing the elements pointed by these indices, noting the side > that comes earlier in the sort order and advancing the index on that > side (or if they compare equal then advance both), ...? And compare all names and stages (especially in the unpack-trees case, when no entry is reused). I kinda hope to avoid that. Speaking about reusing cache_entry, we won't be able to share cache_entry because when it's freed in replace_index_entry, or remove_index_entry_at in the main index, we need to locate the same entry in the shared index as well and remove that stale pointer. Without sharing, we nearly double memory usage from the beginning. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html