Re: GSoC - Designing a faster index format

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> · Wed, 21 Mar 2012 08:18:58 +0700

On Wed, Mar 21, 2012 at 6:10 AM, elton sky <eltonsky9404@xxxxxxxxx> wrote:
> From the idea, I realize the problem is that index is verified and
> rewritten on any operations which is unnecessary sometimes. And the
> objective is to reduce the number of operations to below logN.  As I
> am new to git, I  I couldn't give a detailed plan to this for now. I
> should have gonna through more documents or codes but there's only one
> week for application. So I have to jump up from nowhere :P

Understanding current index format would be a good start, I think:
Documentation/technical/index-format.txt. For reading index code, look
at read_index_from() in read-cache.c (many if not all index
manipulation are in this file)

> I got questions like: how each operations affect index?

For writing part, commands that call refresh_index() can update stat
info for many many entries. git-add, git-update, git-mv and git-rm can
add/remove entries from the index. Merge/checkout oeprations
(git-reset, git-checkout, git-merge..) can rewrite the whole index. I
think this proposal aims to speed up refresh_index and add/remove
operations, not the last one.

To speed up reading part (you can grep read_cache() to see how many
commands read index), you may need to do something with index
integrity check. Currently it calculates SHA-1 of the entire index,
then checks against the stored value at the end of index. Calculating
SHA-1 can be really expensive on big index.

> how cache tree data and index is stored?

Cache tree is stored as an optional index extension. It's also
documented in index-format.txt. Or you can look at cache-tree.[ch]
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html