Re: Decompression speed: zip vs lzo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 10, 2008 at 09:30:59PM +0000, Nicolas Pitre wrote:
> On Thu, 10 Jan 2008, Linus Torvalds wrote:
> 
> > 
> > 
> > On Thu, 10 Jan 2008, Nicolas Pitre wrote:
> > > 
> > > Here's my rather surprising results:
> > > 
> > > My kernel repo pack size without the patch:	184275401 bytes
> > > Same repo with the above patch applied:		205204930 bytes
> > > 
> > > So it is only 11% larger.  I was expecting much more.
> > 
> > It's probably worth doing those statistics on some other projects.
> > 
> > Maybe the difference to other repositories isn't huge, and maybe the 
> > kernel *is* a good test-case, but I just wouldn't take that for granted. 
> 
> Obviously.
> 
> This was a really crud test, and my initial goal was to quickly dismiss 
> Pierre's assertion.  Turns out that he wasn't that wrong after all,

  Well that wasn't a random assertion, I made it, because I assumed that
a delta is usually less than a few hundred bytes, and as compression is
applied only to the delta without context, you end up packing 500 bytes
per 500 bytes which will seldomly have excellent compression ratios.

> and 
> if a significant increase in access speed by avoiding zlib for 82% of 
> object accesses can also be demonstrated for the kernel, then we have an 
> opportunity for some optimization tradeoff with no backward 
> compatibility concerns.

  Well, one could use the fact that deltas are not packed to avoid
copying them around, and that will _necessarily_ become a gain (you can
read them where they have been mmapped for instance). The number that
were given for git annotate use a compression of `0' which doesn't use
that fact, and I wouldn't be surprised to see a noticeable gain if one
does that.

  And actually, maybe that it's not the deltas we should not pack, but
objects under a certain size (say 512 bytes e.g. ?), whichever type they
have, and to have the code exploit that fact for real, and avoid copies.
With this criterion, I expect the repository to not grow a lot larger
(I'd say quite less than the 10% you had, as even in the kernel, there
_are_ some larger deltas, and we definitely loose space for them, I'd
expect less than a 5% size variation), and I _think_ it's worth
investigating. At least I expect visible results on commands (like blame
of even log[0]) that go through a lot of small objects to see 10 to 20%
increase speed (backed up by some experience I have in avoiding copies
in not-so-similar cases though, so it may be less, and I'll stand
corrected -- and disappointed, a bit).

  [0] If I'm correct commit messages are "objects" on their own, and I
      don't expect them to be very often over 512 octets.
-- 
·O·  Pierre Habouzit
··O                                                madcoder@xxxxxxxxxx
OOO                                                http://www.madism.org

Attachment: pgpwLrTosbawd.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux