Multi-target software repo, plus toolchain tracking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At $WORK, we're trying to figure out how to handle a combinatorial
explosion of research projects, backends, and big binary toolchains
and I'm looking into leverage git for the problem. I'd appreciate the
list's advice; here's the basics (I've tried to make it as succinct as
possible):

We have a suite composed of multiple sub-projects; each of these
projects supports some subset of a bunch of backend targets:

repo/
      project1/
		targetA/
		targetB/
		targetD/
		...
      project2/
		targetC/
		targetD/
		...
      project3/
		targetB/
		...
      ...

The complication is that most of these targets depend on some
(possibly experimental) toolchain for compilation, and even execution
in some cases.  We get these tools from upstream as big binary blobs
with timestamps.

We are very interested in tracking how performance changes with
updates in our code as well as changes in the toolchain, so it makes
sense to collect some performance numbers with some code and package
those numbers along with info about which version of a given tool was
used to compile/execute.  Naively, we could just put the tool right in
the repository like so:

repo/
      project1/
		targetA/
			targetA-toolchain/
		targetB/
			targetB-toolchain/
		...
      project3/
		targetB/
			targetB-toolchain/
		...
      ...

The duplication of tools alone is pretty wasteful in disk space, but
worse still, as we update tools and commit them, the size of the repo
--- with all of these giant binary toolchains in it --- is likely to
get pretty huge, and become a burden those who want to clone the repo
and just work on the tip of some branch.

So what we really want is to have git make sure to grab the right
version of each toolchain whenever we move HEAD; I haven't got tons of
experience with submodules, but I think that they are well-suited to
handle this problem.

This is where I'd like to leverage the list's knowledge; there are a
few ways to handle this, and I'm not sure which is the best.  Again,
what we want is:

1. For a given checkout of repo/ to have each targetXXX-toolchain/
directory populated, keeping in mind that different projects may not
all be using the same version of a toolchain
2. For end-users of the repo/ (cloners) to avoid having to lug around
every version of every toolchain; we don't want to 'pollute' the repo
with the large (>100mb) toolchain blobs we get from upstream.
3. If possible, further minimize disk usage by using some sort of
indirection (soft or hard linking, for example) when multiple projects
*are* using the same version of a given toolchain.
4. Minimize maintenance overhead

Here are a few ideas I've had on the subject

I. We could make a separate repo for each tool:

targetA-toolchain-repo/
targetB-toolchain-repo/

and each targetXXX/ directory in the main repo's toolchain
subdirectory treats this as a submodule. This satisfies #1, #2, and
possibly #4, but not #3.

II. Make a separate repo for each tool, but instead of having multiple
submodule checkouts, just have a tools directory with checkouts of all
currently-needed tools, with symlinks from the
projectXXX/targetXXX/targetXXX-toolchain to these top-level
submodules.

i.e.

repo/
      tools/
            targetA-toolchain-ver-c0ffee/      <- submodule checkout
of targetA-toolchain
            targetB-toolchain-ver-deadbeef/ <- submodule checkout of
targetB-toolchain
            targetB-toolchain-ver-b00b1e5/  <- submodule checkout of
targetB-toolchain
      project1/
		targetA/
			targetA-toolchain/  <- symlink to ../../tools/targetA-toolchain-ver-c0ffee/
		targetB/
			targetB-toolchain/  <- symlink to ../../tools/targetB-toolchain-ver-deadbeef/
		...
      project3/
		targetB/
			targetB-toolchain/  <- symlink to ../../tools/targetB-toolchain-ver-b00b1e5/
		...
      ...

This is good on #1, #2, and #3, but has the complication that there's
a bit of maintenance involved - setting up symlinks, deleting unused
submodules in tools/ etc.

I'd really appreciate any comments/advice the list might have on this.
I know my way around most of git pretty well, but I haven't really
used submodules before.

Jason
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]