On Tue, Apr 29, 2014 at 5:46 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes: > >> diff --git a/cache.h b/cache.h >> index 0f6247c..90a5998 100644 >> --- a/cache.h >> +++ b/cache.h >> @@ -135,6 +135,7 @@ struct cache_entry { >> unsigned int ce_mode; >> unsigned int ce_flags; >> unsigned int ce_namelen; >> + unsigned int index; /* for link extension */ >> unsigned char sha1[20]; >> char name[FLEX_ARRAY]; /* more */ >> }; > > I am not sure if we want to keep an otherwise unused 8-byte around > per cache entry (especially for a large project where the split > index mode should matter) after we read the index. > > I expected to see some code where entries in this incremental index > are used to override the entries from the base/shared index, but > merge_base_index() seems to do just memcpy() to discard the former > and replace them with the latter. Is this step meant to work at > all, or is it a smaller step meant to be completed in later patches? This field only matters at write time, not read time. It's to quickly detect if an entry is shared, see prepare_to_write_split_index(). > I do think it is sensible to keep two arrays of "struct cache_entry" > around (one for base and one for incremental changes) inside > index_state, and the patch seems to do so via "struct split_index" > that does have a copy of saved_cache. If the write-out codepath > walks these two sorted arrays in parallel, shouldn't it be able to > figure out which entry is added, deleted and modified without > fattening this structure? So far without that "index" field I would have to resort to hasing entries in both arrays to find the shared paths. But ideas are welcome. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html