Re: [GSoC] Designing a faster index format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:

> On Fri, Apr 6, 2012 at 10:22 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>> On Thu, Apr 5, 2012 at 14:49, Thomas Rast <trast@xxxxxxxxxxx> wrote:
>>> This is quite complete already, which I think is great, but it's still
>>> missing one "obvious" approach: a directory-tree based layout that uses
>>> "flat" storage.  That is, the entries grouped by directory and thus
>>> arranged into the "natural" tree, so as to allow parsing only part of
>>> it.  But not pulling any tricks to make it easy to change; a nontrivial
>>> change would mean a rewrite.  How good do you think that could be?
>>
>> I have been wondering this myself. Aren't most updates to the index
>> just updating the stat information of an existing entry?
>>
>> If so we could structure the index as flat lists for each directory
>> similar to a canonical tree, but with a wider field to hold not just
>> the SHA-1 but also the stat information of each file. If the entry is
>> just the component name ("foo.c" and not "src/foo.c") and the SHA-1
>> and stat data, you can probably protect the entire entry with a CRC-32
>> for each entry. Updates can be performed in place by taking the write
>> lock with index.lock as an empty file, then overwriting the SHA-1 and
>> stat field followed, by updating the CRC-32. Readers that see the
>> wrong CRC-32 for an entry can sleep for a short period, retry the
>> read, and fail after some number of attempts if they cannot get a
>> valid read of that entry.
>
> But does that mean the reader can have some old entries (because the
> writer has not updated them yet by the time of reading) mixed with new
> ones, i.e. inconsistent data?

That can be fixed with another CRC or hash covering the CRCs for the
entries, so that a reader can notice a partial update.

But even so: do we make any promises that (say) git-add is atomic in the
sense that a reader always gets the before-update results or the
after-update results?  Non-builtins (e.g. git add -p) may make small
incremental updates to the index, so they wouldn't be atomic anyway.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]