In my company we generate test data that we want coupled with test code, and despite the size, we have historically kept our test data with our code base. This is becoming a problem. 95% of the size of our 500 meg "code" base is actually test data, and the size of the test data is likely to increase, perhaps radically. We are contemplating files on the order of 500 megabytes a piece. Many of our developers have multiple copies of our code base checked out, duplicating the test data, so we would like to come up with a solution to this that minimizes the amount of data we have to check out. Personally, I dislike having separate test data and code repos. Keeping the two synchronized seems like a real pain. I like to be able to do things like: cd component_x [muck muck muck on part "y"] mkdir testsuite/component_x.part_y cd testsuite/component_x.part_y [muck muck muck] git commit -a -m "Finished mucking with part y of component x" Where the directory structure is, essentially: component_x/ testsuite/component_x.part_y If we separate out the test data, for the above I would have to do two commits in two repos, switching directories, etc. And then, there is the issue of ensuring that checkouts of code also get the associated data needed. I can see this being a potential nightmare. Have others on the list grappled with this and come up with good solutions with git? I know there was some talk of sub-modules, but not sure if that is working or even a viable option here. Bill - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html