Re: large(25G) repository in git

"Marcel M. Cary" <marcel@xxxxxxxxxxxxxxxx> · Thu, 26 Mar 2009 08:43:39 -0700

Adam Heath wrote:
> We maintain a website in git.  This website has a bunch of backend
> server code, and a bunch of data files.  Alot of these files are full
> videos.
>
> We use git, so that the distributed nature of website development can
> be supported.  Quite often, you'll have a production server, with
> online changes occurring(we support in-browser editting of content), a
> preview server, where large-scale code changes can be previewed, then
> a development server, one per programmer(or more).

My company manages code in a similar way, except we avoid this kind of
issue (with 100 gigabytes of user-uploaded images and other data) by not
checking in the data.  We even went so far is as to halve the size of
our repository by removing 2GB of non-user-supplied images -- rounded
corners, background gradients, logos, etc, etc.  This made Git
noticeably faster.

While I'd love to be able to handle your kind of use case and data size
with Git in that way, it's a little beyond the intended usage to handle
hundreds of gigabytes of binary data, I think.

I imagine as your web site grows, which I'm assuming is your goal, your
problems with scaling Git will continue to be a challenge.

Maybe you can find a way to:

* Get along with less data in your non-production environments; we're
hoping to be able to do this eventually

* Find other ways to copy it; we use rsync even though it does take
forever to crawl over the file system

* Put your data files in a separate Git repository, at least, assuming
your checkin, update, and release code more often than your video files.
 That way you'll experience pain less often, and maybe even be able to
tune your repository differently.

Marcel
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html