Re: [PATCH 05/22] read-cache: add index reading api

Thomas Gummerer <t.gummerer@xxxxxxxxx> · Mon, 08 Jul 2013 13:40:01 +0200

Duy Nguyen <pclouds@xxxxxxxxx> writes:

> On Sun, Jul 7, 2013 at 3:11 PM, Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote:
>> Add an api for access to the index file.  Currently there is only a very
>> basic api for accessing the index file, which only allows a full read of
>> the index, and lets the users of the data filter it.  The new index api
>> gives the users the possibility to use only part of the index and
>> provides functions for iterating over and accessing cache entries.
>>
>> This simplifies future improvements to the in-memory format, as changes
>> will be concentrated on one file, instead of the whole git source code.
>>
>> Signed-off-by: Thomas Gummerer <t.gummerer@xxxxxxxxx>
>> ---
>>  cache.h         |  57 +++++++++++++++++++++++++++++-
>>  read-cache-v2.c |  96 +++++++++++++++++++++++++++++++++++++++++++++++--
>>  read-cache.c    | 108 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>>  read-cache.h    |  12 ++++++-
>>  4 files changed, 263 insertions(+), 10 deletions(-)
>>
>> diff --git a/cache.h b/cache.h
>> index 5082b34..d38dfbd 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -127,7 +127,8 @@ struct cache_entry {
>>         unsigned int ce_flags;
>>         unsigned int ce_namelen;
>>         unsigned char sha1[20];
>> -       struct cache_entry *next;
>> +       struct cache_entry *next; /* used by name_hash */
>> +       struct cache_entry *next_ce; /* used to keep a list of cache entries */
>>         char name[FLEX_ARRAY]; /* more */
>>  };
>
> From what I read, doing
>
>     ce = start;
>     while (ce) { do(something); ce = next_cache_entry(ce); }
>
> is the same as
>
>     i = start_index;
>     while (i < active_nr) { ce = active_cache[i]; do(something); i++; }
>
> What's the advantage of using the former over the latter? Do you plan
> to eliminate the latter loop (by hiding "struct cache_entry **cache;"
> from public index_state structure?

Yes, I wanted to eliminate the latter loop, because it depends on the
in-memory format of the index.  By moving all direct accesses of
index_state->cache to an api it gets easier to change the in-memory
format.  I played a bit with a tree-based in-memory format [1], which
represents the on-disk format of index-v5 more closely, making
modifications and partial-loading simpler.

I've tried switching all those loops to api calls, but that would make
the api too bloated because there is a lot of those loops.  I found it
more sensible to do it this way, leaving the loops how they are, while
making future changes to the in-memory format a lot simpler.

[1] https://github.com/tgummerer/git/blob/index-v5api/read-cache-v5.c#L17
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html