At $WORK, we're trying to figure out how to handle a combinatorial explosion of research projects, backends, and big binary toolchains and I'm looking into leverage git for the problem. I'd appreciate the list's advice; here's the basics (I've tried to make it as succinct as possible): We have a suite composed of multiple sub-projects; each of these projects supports some subset of a bunch of backend targets: repo/ project1/ targetA/ targetB/ targetD/ ... project2/ targetC/ targetD/ ... project3/ targetB/ ... ... The complication is that most of these targets depend on some (possibly experimental) toolchain for compilation, and even execution in some cases. We get these tools from upstream as big binary blobs with timestamps. We are very interested in tracking how performance changes with updates in our code as well as changes in the toolchain, so it makes sense to collect some performance numbers with some code and package those numbers along with info about which version of a given tool was used to compile/execute. Naively, we could just put the tool right in the repository like so: repo/ project1/ targetA/ targetA-toolchain/ targetB/ targetB-toolchain/ ... project3/ targetB/ targetB-toolchain/ ... ... The duplication of tools alone is pretty wasteful in disk space, but worse still, as we update tools and commit them, the size of the repo --- with all of these giant binary toolchains in it --- is likely to get pretty huge, and become a burden those who want to clone the repo and just work on the tip of some branch. So what we really want is to have git make sure to grab the right version of each toolchain whenever we move HEAD; I haven't got tons of experience with submodules, but I think that they are well-suited to handle this problem. This is where I'd like to leverage the list's knowledge; there are a few ways to handle this, and I'm not sure which is the best. Again, what we want is: 1. For a given checkout of repo/ to have each targetXXX-toolchain/ directory populated, keeping in mind that different projects may not all be using the same version of a toolchain 2. For end-users of the repo/ (cloners) to avoid having to lug around every version of every toolchain; we don't want to 'pollute' the repo with the large (>100mb) toolchain blobs we get from upstream. 3. If possible, further minimize disk usage by using some sort of indirection (soft or hard linking, for example) when multiple projects *are* using the same version of a given toolchain. 4. Minimize maintenance overhead Here are a few ideas I've had on the subject I. We could make a separate repo for each tool: targetA-toolchain-repo/ targetB-toolchain-repo/ and each targetXXX/ directory in the main repo's toolchain subdirectory treats this as a submodule. This satisfies #1, #2, and possibly #4, but not #3. II. Make a separate repo for each tool, but instead of having multiple submodule checkouts, just have a tools directory with checkouts of all currently-needed tools, with symlinks from the projectXXX/targetXXX/targetXXX-toolchain to these top-level submodules. i.e. repo/ tools/ targetA-toolchain-ver-c0ffee/ <- submodule checkout of targetA-toolchain targetB-toolchain-ver-deadbeef/ <- submodule checkout of targetB-toolchain targetB-toolchain-ver-b00b1e5/ <- submodule checkout of targetB-toolchain project1/ targetA/ targetA-toolchain/ <- symlink to ../../tools/targetA-toolchain-ver-c0ffee/ targetB/ targetB-toolchain/ <- symlink to ../../tools/targetB-toolchain-ver-deadbeef/ ... project3/ targetB/ targetB-toolchain/ <- symlink to ../../tools/targetB-toolchain-ver-b00b1e5/ ... ... This is good on #1, #2, and #3, but has the complication that there's a bit of maintenance involved - setting up symlinks, deleting unused submodules in tools/ etc. I'd really appreciate any comments/advice the list might have on this. I know my way around most of git pretty well, but I haven't really used submodules before. Jason -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html