Re: [PATCH 2/2] index-v4: document the entry format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thomas Rast <trast@xxxxxxxxxxxxxxx> writes:

> I seem to have completely missed the earlier series at
>
>   http://thread.gmane.org/gmane.comp.version-control.git/194660
>
> My bad.
>
> Thomas has been working on a prototype converter over the past few days,
> with results similar to (but not quite as good as) your numbers

The "entry-shrinkage" v4 itself is an afternoon hack (even though it is a
good hack), and any design that would not come close to its result is not
worth considering.  It is good to hear that the student is making progress
learning.

> I think there are actually several separate ideas here:
>
> * The prefix compression.  Thomas is not using this idea; we've been
>   toying with making the index bisectable (within each directory) for
>   fast single-entry lookups, which inherently conflicts with this.  The
>   directory-like layout partially achieves the same (elides common path
>   components).
>
> * The varint encoding (or offset encoding, but "varint" is something you
>   can google :-).  David suggested using it on stat() data, combined
>   with zigzag encoding and delta against the first entry in the
>   directory, which gives some good compression results.  Profiling will
>   have to say whether the extra decoding effort is worth the space
>   savings.
>
> * The lack of variable padding, which is a good idea -- in any case I
>   seem to remember Shawn complaining about it.

I am planning to merge this series early to 'master', before the GSoC
student really starts working on the code, perhaps by this Wednesday. The
earlier parts of this series refactor code to make things easier to
modify, and the later parts of it demonstrate by example both:

 (1) how the backward compatibility must be handled at the design level
     [*1*]; and

 (2) how such a design can be coded cleanly at the implementation level.

The hope is that this will give a solidified base to build whatever new
work on top of (perhaps call it v5). I do not mind David's further work
built on top of this series, but I think the entry-shrinkage design for v4
is good enough as-is. I am afraid that letting the code slushy again at
this point may make your student's work unnecessarily more cumbersome.

How do you want to proceed?


[Footnote]

*1* Here are the minimum requirements.

 - you can read both old and new formats (obviously);

 - by default you write out in the same version you read the original;

 - have a single simple command to explicitly specify what format to
   write out; and

 - make sure that the new format is something older readers can
   reliably notice is new and beyond the version they support
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]