Re: [RFC] Submodules in GIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What if we use linus "module" file concept and allow the link objects
to track subtrees? An object may look like this:

commit: <SHA1>
link: <SHA1> /path/to/remote/tree/or/blob


Tracking upstream library:
--------------------------
clone as usual


Inhouse libraries/applications:
-------------------------------
To satisfy versioning of build-dependencies - make links of type
"external/lib1_header.h" -> "<commit>/headers/lib1_header.h" (blob)
"external/lib1_interface" -> "<commit>/api" (tree)

If git supports "sparse fetching" of subtrees we can follow the
history in the submodule only concerning the files we want without
fetching the whole subtree. "modules" file could specify something
like "always clone on fetch"


Build environment
------------------
First make links to all tools, applications, etc ...
"buildtools/random_app1" -> "<commit>/"
"buildtools/random_app2" -> "<commit>/"
"sub_build_projects/user_interface" -> "<commit>/"
"sub_build_projects/kernel" -> "<commit>/"
"apps/special_app1" -> "<commit>/"
"libs/special_lib1" ->
"<commit-from-another-build-project>/special/lib/binary/path"

Here we can have a build system that for example creates a "i386"
folder and the repo itself


Documentation release
----------------------
"Lib1/" -> "<lib1 commit>/docs"
"Lib2/" -> "<lib2 commit>/docs"
"App1/" -> "<app1 commit>/docs"


Special customer release for a specific HW platform
---------------------------------------------------
"Lib1/lib1.h" -> "<lib1-commit>/headers/lib1.h"
"Lib1/lib1.so" -> "<build-environment-commit>/i386/Lib1/lib1.so"
"Lib1/docs" -> "<lib1-commit>/docs"
"App1_binary" -> "<build-environment-commit>/i386/App1/App1_binary"
"docs" -> "<app1-commit>/docs"

commit&tag&bag this and send to customer. If the customer says
something is broken, we can make an SHA1 of the customers tree and
immediately see if there's changes not belonging to us.


Now this can be broken in so many ways that I can't even count, so I
appreciate some feedback to correct my head.


On 12/9/06, R. Steve McKown <rsmckown@xxxxxxxxx> wrote:
On Saturday 02 December 2006 12:41 pm, Linus Torvalds wrote:
> In other words, I _suspect_ that that is really what module users are all
> about. They want the ability to specify an arbitrary collection of these
> atomic snapshots (for releases etc), and just want a way to copy and move
> those things around, and are less interested in making everything else
> very seamless (because most people are happy to do the actual
> _development_ entirely within the submodules, so the "development" part
> is actually not that important for the supermodule, the supermodule is
> mostly for aggregation and snapshots, and tying different versions of
> different submodules together).
>
> So that's where I come from. And maybe I'm totally wrong. I'd like to hear
> what people who actually _use_ submodules think.

Here's some thoughts on subprojects from my company's perspective.  I
apologize for the long message.

Abstract: We use submodules heavily in CVS and SVN.  I like what I've read
from Linus about the "thin veneer" approach of integrating subprojects.  It
seems conceptually to provide the support we desire.  For us, it's important
that the mandated linkage between a master project and a subproject is
minimal to maximize our flexibility in building our processes.


We develop and maintain a lot of embedded applications.  Both for higher level
systems (ex: 32MB RAM/32MB storage) running the Linux kernel and a customized
set of libs/app support code and more deeply embedded environments (ex: 8KB
of RAM and 32KB of storage).  Even though these two cases are very different
in many repects, the version management issues are the same.

- We (mostly) track everything needed to build historical versions of code
with 100% fidelity.  This includes all of the tools used to compile, build,
test, deploy, debug, etc. the actual build results themselves.  I initially
looked at Vesta several years ago.  I love their conceptual approach to this
problem (integrated build system that caches mid-level build results within
the repository itself), but it's too unwieldy, very hard to set up (lots of
up-front effort), and lacks many useful features.

- Most of our "applications" are a relatively small amount of app-specific
code with references to several/many shared modules.  Shared modules can
contain support tools, like build/test/debug/deploy support for a given
embedded platform, in-house developed shared app code, or shared code
developed by third parties.

- We use CVS to manage our larger system development projects.  The repo is
about 2GB and has several dozen application-code submodules.  We use the
"third party sources" approach to tracking submodules as outlined in Ch.13 of
the CVS manual.  Additionally, we manage our "buildox" (similar to buildroot
in concept) in another CVS repo.  All prior interesting versions of the
buildroot can be built from source (toolchains, everything), if necessary.
Applications contain metadata (a file...) in the repo so the app-level build
system can ensure it is being ran under the correct version of buildbox;
clunky but serviceable.  CVS is a nightmare because of its poor
branch/tagging facilities, and many of the things we *ought* to be doing with
revision control we don't because of the complexity.

- We use SVN to manage our deeply embedded system projects.  The repo is about
250MB in size.  Applications use the svn:externals property to reference
needed modules.  We aren't using a buildbox in this environment yet (bad!).
SVN's simple branching and svn:externals are a giant leap forward in
comparison to CVS's capabilities.


Below are some common use case scenarios that are to varying degrees unweildy
in CVS and/or SVN.  Many of these involving non-trivial branching and merging
operations are nearly impractical in CVS, and the lack of merge tracking (to
support repeated safe merging from one branch to another) makes some of these
a bit tricky in SVN too.  Of course neither repo supports
disconnected/distributed operation, which would make a number of activities
that much simpler as well.

- Round trip module management.  A specific app requires a change to a shared
module, so it makes a local branch to develop the change.  The "diff" is
presented to the maintainer (who may be inhouse).  The next interesting
maintainer version of the module gets imported into our repo (if in house,
it's already there), where the app can reference it.  This merge process may
leave changes not yet implemented (or never to be implemented) by the module
maintainer in the local branch used by the apps.  Other apps are unaffected,
as they are linking to a prior version in the local branch.

- Pragmatic development.  It's typical that in developing an application, a
developer will need to simultaneously make changes to one or more submodules.
If more than trivial, he/she should branch the submodules and continually
tracking the HEAD of those branches in the relevant app.  This is so complex
and fraught with problems in CVS that it doesn't get done, and developers
house too much change over time in their working directories.  With SVN and
svn:externals, the process is workable.  It is nice that an svn:external can
point to (the HEAD of) a branch when making changes.

- An application implements a new feature internally (say support for a new
digital chipset in the embedded world) which later needs to be "promoted" to
a subproject for use by others.  Pretty easy in SVN.  A challenge in CVS;
it's really not possible to "convert" app code into a "third party source"
and retain an historical link.

- Updating build tools.  In concept no different than updating a shared code
module.  In practice, due to the buildbox strategy, it's a bit convoluted.  I
don't expect this to get much smoother.  Getting Vesta-like features, where
integrated build suport can cache lower-level build results in a version-safe
manner (like the binary code built when the cross toolchain was built) would
be killer, but that's surely OT for the submodules discussion.

Thanks,
Steve
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]