Re: [PATCH 2/2] index-v4: document the entry format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Junio,

I seem to have completely missed the earlier series at

  http://thread.gmane.org/gmane.comp.version-control.git/194660

My bad.

Thomas has been working on a prototype converter over the past few days,
with results similar to (but not quite as good as) your numbers

    $ ls -l .git/index*
    -rw-r----- 1 jch eng 25586488 2012-04-03 15:27 .git/index
    -rw-r----- 1 jch eng 14654328 2012-04-03 15:38 .git/index-4

while taking a different approach with different tradeoffs.

Nevertheless...

> +  (Version 4) In version 4, the entry path name is prefix-compressed
> +    relative to the path name for the previous entry (the very first
> +    entry is encoded as if the path name for the previous entry is an
> +    empty string).  At the beginning of an entry, an integer N in the
> +    variable width encoding (the same encoding as the offset is encoded
> +    for OFS_DELTA pack entries; see pack-format.txt) is stored, followed
> +    by a NUL-terminated string S.  Removing N bytes from the end of the
> +    path name for the previous entry, and replacing it with the string S
> +    yields the path name for this entry.
[..]
> +  (Version 4) In version 4, the padding after the pathname does not
> +  exist.

I think there are actually several separate ideas here:

* The prefix compression.  Thomas is not using this idea; we've been
  toying with making the index bisectable (within each directory) for
  fast single-entry lookups, which inherently conflicts with this.  The
  directory-like layout partially achieves the same (elides common path
  components).

* The varint encoding (or offset encoding, but "varint" is something you
  can google :-).  David suggested using it on stat() data, combined
  with zigzag encoding and delta against the first entry in the
  directory, which gives some good compression results.  Profiling will
  have to say whether the extra decoding effort is worth the space
  savings.

* The lack of variable padding, which is a good idea -- in any case I
  seem to remember Shawn complaining about it.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]