Marcel M. Cary wrote: > My company manages code in a similar way, except we avoid this kind of > issue (with 100 gigabytes of user-uploaded images and other data) by not > checking in the data. We even went so far is as to halve the size of > our repository by removing 2GB of non-user-supplied images -- rounded > corners, background gradients, logos, etc, etc. This made Git > noticeably faster. Disk space is cheap. > While I'd love to be able to handle your kind of use case and data size > with Git in that way, it's a little beyond the intended usage to handle > hundreds of gigabytes of binary data, I think. > > I imagine as your web site grows, which I'm assuming is your goal, your > problems with scaling Git will continue to be a challenge. > > Maybe you can find a way to: > > * Get along with less data in your non-production environments; we're > hoping to be able to do this eventually We do that by only cloning/checking out certain modules. However, as is always the case, sometimes a bug occurs with production data, and you need to use the real data to track it down. > * Find other ways to copy it; we use rsync even though it does take > forever to crawl over the file system > > * Put your data files in a separate Git repository, at least, assuming > your checkin, update, and release code more often than your video files. > That way you'll experience pain less often, and maybe even be able to > tune your repository differently. As already mentioned, our sub-sites *are* in separate repos. There's a base repository, that has just the event/backend code. Then 32 *other* repositories, where the actual websites are. We want to use *some* kind of versioning system. Being able to have history of *all* changes is extremely useful. Not to mention being able to track what each separate user does as they modify their files thru their browser. subversion is just right out. It's centralized. It leaves poop all over the place. mercurial is just right out. If you do several *separate* commits of *separate* files, but don't push for some time period, then eventually do a push/pull, where the sum total of the changes is larger than some value, mercurial will fail when it tries to then update the local directory. This limit is based on 2G, a hard-coded python limit(even on a 64-bit host), because mercurial reads the entire set of changes into a python string. git mmaps files, does window scanning of the pack files. It *might* read a single file all into memory, for compression purposes; I'm not certain on this. We certainly haven't hit any limits that cause it to fail outright. I haven't tried any others. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html