Re: [PATCH 00/32] Split index mode for very large indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 29, 2014 at 4:18 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Mon, Apr 28, 2014 at 3:55 AM, Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> wrote:
>> I hinted about it earlier [1]. It now passes the test suite and with a
>> design that I'm happy with (thanks to Junio for a suggestion about the
>> rename problem).
>>
>> From the user point of view, this reduces the writable size of index
>> down to the number of updated files. For example my webkit index v4 is
>> 14MB. With a fresh split, I only have to update an index of 200KB.
>> Every file I touch will add about 80 bytes to that. As long as I don't
>> touch every single tracked file in my worktree, I should not pay
>> penalty for writing 14MB index file on every operation.
>
> This is a very welcome type of improvement.
>
> I am however concerned about the complexity of the format employed.
> Why do we need two EWAH bitmaps in the new index? Why isn't this just
> a pair of sorted files that are merge-joined at read, with records in
> $GIT_DIR/index taking priority over same-named records in
> $GIT_DIR/sharedindex.$SHA1?  Deletes could be marked with a bit or an
> "all zero" metadata record.

With the bitmaps, I know the exact position to replace or delete an
entry. Merge sort works, but I would need to walk through all entries
in both indexes to compare entry name and stage, a bit costly in my
opinion. And if you look at the format description in patch 0017, I
store the replaced entries without their names to save a bit more
space. "EWAH" is just an implementation detail. A straightforward
bitmap should work fine (25kb for 200k entries seem reasonable).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]