Hi Nguyen, Jakub Thank you for your explanations. Just clarify question about track updated files: On Fri, Mar 23, 2012 at 12:30 PM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote: > On Fri, Mar 23, 2012 at 3:32 AM, elton sky <eltonsky9404@xxxxxxxxx> wrote: >> Got a few questions: >> >> 1. index is used for building next commit, so it should only include >> files created/modified/deleted. But I see it has all entries for >> current working dir. why? > > Jakub has answered this question. > >> 2. From read_index_from() I see the whole index is read into mem, and >> write one by one (entry/ext) back to disk. This makes sense. But why >> we have to compute Sha1 for all entries, especially unchanged entries? > > To catch disk corruption. If a bit is flipped anywhere in the index > and we do not detect it, we may end up creating broken commits. > >> 3. how does git track updated files? Does it compare the ts between >> working dir and index ? Or they are recorded somewhere? > > Check out refresh_cache_ent. At the beginning of most commands, they > call refresh_index() or refresh_cache(), which checks a file's mtime > against one stored in index (different means updated). In the worst > scenario, refresh_cache_ent may call ce_compare_data(), which computes > SHA-1 of the specified file and compare it with one stored in index. > This means working dir will compare each entry in index on mtime field, to find out if it's updated. The complexity for this operation is O(nlogn). I assume the way of this checking is: it loops through entries in the index, for each entry, it searches in working dir and compare the mtime. Because current index is a single steam of file, when it writes back it has to write back everything sequentially. So we have to do checksum for every entry. And I suppose this process is more time consuming than previous step. If we use a tree format, still, when looking for updated files, time complexity is O(nlogn), i.e. we traverse the index entries and for each entry we refer back to working dir. However, when we write index back, we only need to recompute and write updated file nodes, but not all entries. Total processing time benefit from here. Please correct me if I am wrong. -Elton >> 4. When does git insert to cache tree? and when it retrieve from it? > > cache-tree is built from scratch in some cases, when we know HEAD (or > some tree) matches index exactly (e.g. reset --hard). Usually it's > only built up at commit time (update_main_cache_tree in > builtin/commit.c). > -- > Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html