Re: [GSoC] Designing a faster index format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 6, 2012 at 10:22 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Thu, Apr 5, 2012 at 14:49, Thomas Rast <trast@xxxxxxxxxxx> wrote:
>> This is quite complete already, which I think is great, but it's still
>> missing one "obvious" approach: a directory-tree based layout that uses
>> "flat" storage.  That is, the entries grouped by directory and thus
>> arranged into the "natural" tree, so as to allow parsing only part of
>> it.  But not pulling any tricks to make it easy to change; a nontrivial
>> change would mean a rewrite.  How good do you think that could be?
>
> I have been wondering this myself. Aren't most updates to the index
> just updating the stat information of an existing entry?
>
> If so we could structure the index as flat lists for each directory
> similar to a canonical tree, but with a wider field to hold not just
> the SHA-1 but also the stat information of each file. If the entry is
> just the component name ("foo.c" and not "src/foo.c") and the SHA-1
> and stat data, you can probably protect the entire entry with a CRC-32
> for each entry. Updates can be performed in place by taking the write
> lock with index.lock as an empty file, then overwriting the SHA-1 and
> stat field followed, by updating the CRC-32. Readers that see the
> wrong CRC-32 for an entry can sleep for a short period, retry the
> read, and fail after some number of attempts if they cannot get a
> valid read of that entry.

But does that mean the reader can have some old entries (because the
writer has not updated them yet by the time of reading) mixed with new
ones, i.e. inconsistent data?


> Adding a new file or deleting an existing file from the index would
> require a full rewrite.
>
> Within a single tree/directory entry, it probably doesn't have to be
> binary searchable. Canonical trees aren't. Linear scans through a
> directory is OK, so long as the scans are broken up by the directory
> tree structure just like they are in canonical trees.
>
> Dealing with the conflict stages during merges (1, 2, 3) could be
> handled by appending the conflict data at the end of the index file,
> when conflicts are resolved this tail region could be truncated back
> to the real end of the file. A bit could be set on the normal entry in
> the trees to denote there is a conflict, and additional stage data is
> expected in the tail region of the file.



-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]