On Wed, May 2, 2012 at 11:58 AM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote: > On Fri, Apr 6, 2012 at 3:41 PM, David Barr <davidbarr@xxxxxxxxxx> wrote: >> On Thu, Apr 5, 2012 at 4:44 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>> Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: >>> >>>> On Wed, Apr 4, 2012 at 5:53 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>>> ... >>>> I wonder what causes user time drop from .29s to .13s here. I think >>>> the main patch should increase computation, even only slightly, not >>>> less. >>> >>> The main patch reduced the amount of the data needs to be sent to the >>> machinery to checksum and write to disk by about 45%, saving both I/O >>> and computation. >> >> I hacked together a quick patch to try predictive coding the other >> fields of the index. I got a further 34% improvement in size over >> this series. Patches to come. I just used the previous cache entry as >> the predictor and reused varint.h together with zigzag encoding[1]. >> >> That's a total improvement in size over v2 of 62%. > > Have you posted (and I missed) the patches? I'm interested in seeing > what changes you made. I haven't posted anything - my proof of concept was write-only and slow. I added a prelude with a bitmask that describes which fields differ with the previous entry. For each differing field, I encoded something like: diff := this - prev; zigzag := (diff << 1) ^ (diff >> 31) raw := zigzag - 1 /* zero impossible because of mask */ write_varint(raw) I also experimented with using unique sha1 prefixes but it was slow and probably introduces race conditions. >> [1] https://developers.google.com/protocol-buffers/docs/encoding#types -- David Barr -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html