Thomas Rast <trast@xxxxxxxxxxxxxxx> writes: > I seem to have completely missed the earlier series at > > http://thread.gmane.org/gmane.comp.version-control.git/194660 > > My bad. > > Thomas has been working on a prototype converter over the past few days, > with results similar to (but not quite as good as) your numbers The "entry-shrinkage" v4 itself is an afternoon hack (even though it is a good hack), and any design that would not come close to its result is not worth considering. It is good to hear that the student is making progress learning. > I think there are actually several separate ideas here: > > * The prefix compression. Thomas is not using this idea; we've been > toying with making the index bisectable (within each directory) for > fast single-entry lookups, which inherently conflicts with this. The > directory-like layout partially achieves the same (elides common path > components). > > * The varint encoding (or offset encoding, but "varint" is something you > can google :-). David suggested using it on stat() data, combined > with zigzag encoding and delta against the first entry in the > directory, which gives some good compression results. Profiling will > have to say whether the extra decoding effort is worth the space > savings. > > * The lack of variable padding, which is a good idea -- in any case I > seem to remember Shawn complaining about it. I am planning to merge this series early to 'master', before the GSoC student really starts working on the code, perhaps by this Wednesday. The earlier parts of this series refactor code to make things easier to modify, and the later parts of it demonstrate by example both: (1) how the backward compatibility must be handled at the design level [*1*]; and (2) how such a design can be coded cleanly at the implementation level. The hope is that this will give a solidified base to build whatever new work on top of (perhaps call it v5). I do not mind David's further work built on top of this series, but I think the entry-shrinkage design for v4 is good enough as-is. I am afraid that letting the code slushy again at this point may make your student's work unnecessarily more cumbersome. How do you want to proceed? [Footnote] *1* Here are the minimum requirements. - you can read both old and new formats (obviously); - by default you write out in the same version you read the original; - have a single simple command to explicitly specify what format to write out; and - make sure that the new format is something older readers can reliably notice is new and beyond the version they support -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html