Re: GSoC - Designing a faster index format

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> · Fri, 23 Mar 2012 08:30:16 +0700

On Fri, Mar 23, 2012 at 3:32 AM, elton sky <eltonsky9404@xxxxxxxxx> wrote:
> Got a few questions:
>
> 1. index is used for building next commit, so it should only include
> files created/modified/deleted. But I see it has all entries for
> current working dir. why?

Jakub has answered this question.

> 2. From read_index_from() I see the whole index is read into mem, and
> write one by one (entry/ext) back to disk. This makes sense. But why
> we have to compute Sha1 for all entries, especially unchanged entries?

To catch disk corruption. If a bit is flipped anywhere in the index
and we do not detect it, we may end up creating broken commits.

> 3. how does git track updated files? Does it compare the ts between
> working dir and index ? Or they are recorded somewhere?

Check out refresh_cache_ent. At the beginning of most commands, they
call refresh_index() or refresh_cache(), which checks a file's mtime
against one stored in index (different means updated). In the worst
scenario, refresh_cache_ent may call ce_compare_data(), which computes
SHA-1 of the specified file and compare it with one stored in index.

> 4. When does git insert to cache tree? and when it retrieve from it?

cache-tree is built from scratch in some cases, when we know HEAD (or
some tree) matches index exactly (e.g. reset --hard). Usually it's
only built up at commit time (update_main_cache_tree in
builtin/commit.c).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html