On Tue, Apr 29, 2014 at 4:18 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: > On Mon, Apr 28, 2014 at 3:55 AM, Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> wrote: >> I hinted about it earlier [1]. It now passes the test suite and with a >> design that I'm happy with (thanks to Junio for a suggestion about the >> rename problem). >> >> From the user point of view, this reduces the writable size of index >> down to the number of updated files. For example my webkit index v4 is >> 14MB. With a fresh split, I only have to update an index of 200KB. >> Every file I touch will add about 80 bytes to that. As long as I don't >> touch every single tracked file in my worktree, I should not pay >> penalty for writing 14MB index file on every operation. > > This is a very welcome type of improvement. > > I am however concerned about the complexity of the format employed. > Why do we need two EWAH bitmaps in the new index? Why isn't this just > a pair of sorted files that are merge-joined at read, with records in > $GIT_DIR/index taking priority over same-named records in > $GIT_DIR/sharedindex.$SHA1? Deletes could be marked with a bit or an > "all zero" metadata record. With the bitmaps, I know the exact position to replace or delete an entry. Merge sort works, but I would need to walk through all entries in both indexes to compare entry name and stage, a bit costly in my opinion. And if you look at the format description in patch 0017, I store the replaced entries without their names to save a bit more space. "EWAH" is just an implementation detail. A straightforward bitmap should work fine (25kb for 200k entries seem reasonable). -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html