At Khan Academy, we are running a Jenkins installation as our build server. By design, our Jenkins machine has several different directories that each hold a copy of the same git repository. (For instance, Jenkins may be running tests on our repo at several different commits at the same time.) When Jenkins decides to run a test -- I'm simplifying a bit -- it will pick one of the copies of the repo, do a 'git fetch origin && git checkout <some commit>' and the run the tests. Our repo has a lot of churn and some big files, and this git fetch can take a long time. I'd like to reduce both the time to fetch and the disk space used by sharing objects between the repo copies. My research has turned up three techniques that try to address this use case: * git clone --reference * git clone --shared * git clone <local repo>, which creates hard links I can probably use any of these approaches, but git clone --reference would be the easiest to set up. I would do so by creating a 'cache' repo that is just created to serve as a reference and not used in any other way, so I wouldn't have to worry about the dangers with pruning, accidentally deleting the repo, etc. My big concern is that all these methods seem to just affect clone. So: Question 1) If I do 'git clone --reference, will the reference repo be used for subsequent fetches as well? What about 'git clone --shared'? Question 2) If I git clone a local repo, will subsequent fetches also create hard links? Question 3) If the answer to any of the above is yes, how does this work with packing? Say I pack the reference repo (being careful not to prune anything). Will subsequent fetches still be able to get the objects they need from the reference repo? An added complication is submodules. We have a submodule that is as big and slow to fetch as our main repository. Question 4) Is there a practical way to set up submodules so they can use the same object-sharing framework that the main repo does? I'm not keen on rewriting .gitmodules in each of my repos, so probably something that uses info/alternates is the most workable. I have a scheme for setting that up that maybe will work, but it's a moot point if info/alternates doesn't work for fetching. I'm wondering if the best approach for us might be to use GIT_OBJECT_DIRECTORY: set GIT_OBJECT_DIRECTORY to the shared cached directory for each of our repos, so they all fetch to the same place. Question 5) I haven't seen this mentioned anywhere else, so I'm guessing it won't work. Am I missing a big problem? Question 6) Will git be sad if two different repos that share an object directory, both do 'git fetch' at the same time? I could maybe protect fetches with an flock, but jenkins can do git operations behind my back so it would be easier if I didn't have to worry about locking. Question 7) Is GIT_OBJECT_DIRECTORY supposed to work with subrepos? In my experimentation, it looks like it doesn't: when I run 'GIT_OBJECT_DIRECTORY=../obj git submodule update --init' it still puts the objects in .git/modules/<submodule>/objects/. Is this a bug? Is there any way to work around it? Any suggestions would be appreciated! It feels to me like this is something that git should support pretty easily given its architecture, but I just don't see a way to do it. Thanks, craig -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html