Got a few questions: 1. index is used for building next commit, so it should only include files created/modified/deleted. But I see it has all entries for current working dir. why? 2. From read_index_from() I see the whole index is read into mem, and write one by one (entry/ext) back to disk. This makes sense. But why we have to compute Sha1 for all entries, especially unchanged entries? 3. how does git track updated files? Does it compare the ts between working dir and index ? Or they are recorded somewhere? 4. When does git insert to cache tree? and when it retrieve from it? Some early thoughts for the tree format: We can use B tree like format. Keep the header in the beginning of the file as is, but add file length (4bytes) and the pointer to extensions (8bytes) into header. Entry list follows the header. The entry starts with number of children offsets (1 byte) followed by list of offsets (4 bytes each). We can limit the number for balance. Other fields leave as is. Extensions can locate in between entries. Use Sha1 , rather than the path, as the key for each entry node. This beats the case like 1000 files in a dir which breaks the balance of the tree, as Thomas mentioned. If a file is updated, the old Sha1 can be found in object dir. This also gives flexibility. We may use splay tree, in order to move updated nodes close to the root. The downside is full path has to be stored in entry. Regards, Elton On Wed, Mar 21, 2012 at 11:01 PM, elton sky <eltonsky9404@xxxxxxxxx> wrote: > Hi Nguyen, Thomas > > Thanks for the points &clues. Processing them... > > -Elton > > On Wed, Mar 21, 2012 at 10:25 PM, Thomas Rast <trast@xxxxxxxxxxxxxxx> wrote: >> elton sky <eltonsky9404@xxxxxxxxx> writes: >> >>> I got questions like: how each operations affect index? how cache tree >>> data and index is stored? >>> Maybe you can point me how I should catch up quickly. I went through >>> the article "git-for-computer-scientists", that quite makes sense. >> >> In addition to what Nguyen Thai Ngoc Duy said, check out the >> (sub)threads >> >> http://thread.gmane.org/gmane.comp.version-control.git/190016/focus=190132 >> [origins of the GSoC project idea] >> >> http://thread.gmane.org/gmane.comp.version-control.git/192014/focus=192025 >> [perspectives of core developers in reply to the idea] >> >> http://thread.gmane.org/gmane.comp.version-control.git/186244/focus=186282 >> http://thread.gmane.org/gmane.comp.version-control.git/186357 >> [the last few discussions about cache-tree] >> >> -- >> Thomas Rast >> trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html