Re: GSoC - Designing a faster index format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Got a few questions:

1. index is used for building next commit, so it should only include
files created/modified/deleted. But I see it has all entries for
current working dir. why?

2. From read_index_from() I see the whole index is read into mem, and
write one by one (entry/ext) back to disk. This makes sense. But why
we have to compute Sha1 for all entries, especially unchanged entries?

3. how does git track updated files? Does it compare the ts between
working dir and index ? Or they are recorded somewhere?

4. When does git insert to cache tree? and when it retrieve from it?


Some early thoughts for the tree format:

We can use B tree like format. Keep the header in the beginning of the
file as is, but add file length (4bytes) and the pointer to extensions
(8bytes) into header.
Entry list follows the header. The entry starts with number of
children offsets (1 byte) followed by list of offsets (4 bytes each).
We can limit the number for balance. Other fields leave as is.
Extensions can locate in between entries.

Use Sha1 , rather than the path, as the key for each entry node. This
beats the case like 1000 files in a dir which breaks the balance of
the tree, as Thomas mentioned. If a file is updated, the old Sha1 can
be found in object dir. This also gives flexibility. We may use splay
tree, in order to move updated nodes close to the root. The downside
is full path has to be stored in entry.

Regards,
Elton

On Wed, Mar 21, 2012 at 11:01 PM, elton sky <eltonsky9404@xxxxxxxxx> wrote:
> Hi Nguyen, Thomas
>
> Thanks for the points &clues. Processing them...
>
> -Elton
>
> On Wed, Mar 21, 2012 at 10:25 PM, Thomas Rast <trast@xxxxxxxxxxxxxxx> wrote:
>> elton sky <eltonsky9404@xxxxxxxxx> writes:
>>
>>> I got questions like: how each operations affect index? how cache tree
>>> data and index is stored?
>>> Maybe you can point me how I should catch up quickly. I went through
>>> the article "git-for-computer-scientists", that quite makes sense.
>>
>> In addition to what Nguyen Thai Ngoc Duy said, check out the
>> (sub)threads
>>
>>  http://thread.gmane.org/gmane.comp.version-control.git/190016/focus=190132
>>  [origins of the GSoC project idea]
>>
>>  http://thread.gmane.org/gmane.comp.version-control.git/192014/focus=192025
>>  [perspectives of core developers in reply to the idea]
>>
>>  http://thread.gmane.org/gmane.comp.version-control.git/186244/focus=186282
>>  http://thread.gmane.org/gmane.comp.version-control.git/186357
>>  [the last few discussions about cache-tree]
>>
>> --
>> Thomas Rast
>> trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]