Re: [PATCH 3/6] Stop producing index version 2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 7, 2012 at 10:09 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> 2012/2/5 Junio C Hamano <gitster@xxxxxxxxx>:
>> Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:
>>
>>> read-cache.c learned to produce version 2 or 3 depending on whether
>>> extended cache entries exist in 06aaaa0 (Extend index to save more flags
>>> - 2008-10-01), first released in 1.6.1. The purpose is to keep
>>> compatibility with older git. It's been more than three years since
>>> then and git has reached 1.7.9. Drop support for older git.
>>
>> Cc'ing this, as I suspect this would surely raise eyebrows of some people
>> who wanted to get rid of the version 3 format.
>
> Version 3 was a mistake because of the variable length record sizes.
> Saving 2 bytes on some records that don't use the extended flags makes
> the index file *MUCH* harder to parse. So much so that we should take
> version 3 and kill it, not encourage it as the default!

Probably too late for that, but it's good to know there are strong
user base for v2.

> <thinking type="wishful" probability="never-happen"
> probably-inflating-flame-from="linus">
>
> I have long wanted to scrap the current index format. I unfortunately
> don't have the time to do it myself. But I suspect there may be a lot
> of gains by making the index format match the canonical tree format
> better by keeping the tree structure within a single file stream,
> nesting entries below their parent directory, and keeping tree SHA-1
> data along with the directory entry. For one thing the index would be
> able to register an empty subdirectory, rather than ignoring them. It
> would also better line up with the filesystem's readdir() handling,
> giving us more sane logic to compare what readdir() tells us exists
> against what the index thinks should be in the same file. And the
> overall index should be smaller, because we don't have to repeat the
> same path/to/a/file/for/every/file/in/that/same/directory/tree.
> Reconstructing the path strings at read time into a flat list should
> be pretty trivial, and still keep the parallel lstat calls running off
> a flat list working well for fast status operations.
>
> </thinking>

Haven't really thought through, but I suppose we could create extended
tree object format (there is info in cache entry that's not in tree
entry), store index in this format, then pack together and store the
pack as part of index file. Append-only access to index would be
possible by appending a new pack of new trees to index) I think with
tree-based index, we could kill a big chunk of code (merging trees and
index together) in unpack_trees(). With further efforts to remove
list-based index usage, we could even kill match_pathspec_depth(),
making tree_entry_interesting() the only function to match patchspec.
But dreams probably never come true.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]