Re: serious performance issues with images, audio files, and other "non-code" data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 12, 2010 at 02:53:53PM -0400, John wrote:

> We're seeing serious performance issues with repos that store media
> files, even relatively small files. For example, a web site with less
> than 100 MB of images can take minutes to commit, push, or pull when
> images have changed.

That sounds way too slow from my experiences. I have a repository with 3
gigabytes of photos and videos. Committing 20M of new images takes a
second or two. The biggest slowdown is doing the sha1 over the new data
(which actually happens during "git add").

What version of git are you using? Have you tried "commit -q" to
suppress the diff at the end of commit?

Can you show us exactly what commands you're using, along with timings
so we can see where the slowness is?

For pushing and pulling, you're probably seeing delta compression, which
can be slow for large files (though again, minutes seems kind of slow to
me). It _can_ be worth doing for images, if you do things like change
only exif tags but not the image data itself. But if the images
themselves are changing, you probably want to try setting the "-delta"
attribute. Like:

  echo '*.jpg -delta' >.gitattributes

Also, consider repacking your repository, which will generate a packfile
that will be re-used during push and pull.

> Our first guess was that git is repeatedly attempting to
> compress/decompress data that had already been compressed. We tried

Git does spend a fair bit of time in zlib for some workloads, but it
should not create problems on the order of minutes.

>    core.compression 0   ## Docs say this disables compression. Didn't seem to work.

That should disable zlib compression of loose objects and objects within
packfiles. It can save a little time for objects which won't compress,
but you will lose the size benefits for any text files.

But it won't turn off delta compression, which is what the
"compressing..." phase during push and pull is doing. And which is much
more likely the cause of slowness.

>    pack.depth 1     ## Unclear what this does.

It says you can't make a chain of deltas deeper than 1. It's probably
not what you want.

>    pack.window 0    ## No idea what this does.

It sets the number of other objects git will consider when doing delta
compression. Setting it low should improve your push/pull times. But you
will lose the substantial benefit of delta-compression of your non-image
files (and git's meta objects). So the "-delta" option above for
specific files is a much better solution.

>    gc.auto 0        ## We hope this disables automatic packing.

It disables automatic repacking when you have a lot of objects. You
_have_ to pack when pushing and pulling, since packfiles are the
on-the-wire format. What will help is:

  1. Having repositories already packed, since git can re-use the packed
     data.

  2. Using -delta so that things which delta poorly are just copied into
     the packfile as-is.

> Is there a trick to getting git to simply "copy files as is"?  In
> other words, don't attempt to compress them, don't attempt to "diff"
> them, just store/copy/transfer the files as-is?

Hopefully you can pick out the answer to that question from the above
statements. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]