[RFE] Add minimal universal release management capabilities to GIT

nicolas.mailhot@xxxxxxxxxxx · Fri, 20 Oct 2017 12:40:39 +0200 (CEST)

Hi,

Git is a wonderful tool, which has transformed how software is created, and made code sharing and reuse, a lot easier (both between human and software tools).

Unfortunately Git is so good more and more developers start to procrastinate on any activity that happens outside of GIT, starting with cutting releases. The meme "one only needs a git commit hash" is going strong, even infecting institutions like lwn and glibc (https://lwn.net/SubscriberLink/736429/e5a8c8888cc85cc8/)

However, the properties that make a hash commit terrific at the local development level, also make it suboptimal as a release ID:

– hashes are not ordered. A human can not guess the sequencing of two hashes, nor can a tool, without access to Git history. Just try to handle "critical security problem in project X, introduced with version Y and fixed in Z" when all you have is some git hashes. hashing-only introduces severe frictions when analysing deployment states.

— hashes are not ranked. You can not guess, looking at a hash, if it corresponds to a project stability point, or is in a middle of a refactoring sequence, where things are expected to break. Evaluating every hash of every project you use quickly becomes prohibitive, with the only possible strategy being to just use the latest commit at a given time and pray (and if you are lucky never never update afterwards unless you have lots of fixing and testing time to waste).

– commit mixing is broken by design. One can not adapt the user of a piece of code to changes in this piece of code before those changes are committed in the first place. There will always be moments where the latest commit of a project, is incompatible with the latest commit of downsteam users of this project. It is not a problem in developer environments and automated testers, where you want things to break early and be fixed early. It is a huge problem when you follow the same early commit push strategy for actual production code, where failures are not just a red light in a build farm dashboard, but have real-world consequences. And the more interlinked git repositories you pile on one another, the higher the probability is two commits won't work with one another with failures cascading down

– commits are too granular. Even assuming one could build an automated regression farm powerful enough to build and test instantaneously every commit, it is not possible to instantaneously push those rebuilds to every instance where this code is deployed (even with infinite bandwidth, infinite network reach and infinite network availability). Computers would be spending their time resetting to the latest build of one component or another, with no real work being done. So there will always be a distance, between the latest commit in a git repo, and what is actually deployed. And we've seen bare hashes make evaluating this distance difficult

– commits are a bad inter-project synchronisation point. There are too many of them, they are not ranked, everyone is choosing a different commit to deploy, that effectively kills the network effects that helped making traditional releases solid (because distributors used the same release state, and could share feedback and audit results).

One could mitigate those problems in a Git management overlay (and, indeed, many try). The problem of those overlays is that they have variable maturity levels, make incompatible choices, cut corners, are not universal like Git, making building anything on top of them of dubious value, with quick fallback to commit hashes, which *are* universal among Git repos. Release handling and versioning really needs to happen in Git itself to be effective.

Please please please add release handling and versioning capabilities to Git itself. Without it some enthusiastic Git adopters are on a fast trajectory to unmanageable hash soup states, even if they are not realising it yet, because the deleterious side effects of giving up on releases only get clear with time.

Here is what such capabilities could look like (people on this list can probably invent something better, I don't care as long as something exists).

1. "release versions" are first class objects that can be attached to a commit (not just freestyle tags that look like versions, but may be something else entirely). Tools can identify release IDs reliably.

2. "release versions" have strong format constrains, that allow humans and tools to deduce their ordering without needing access to something else (full git history or project-specific conventions). The usual string of numbers separated by dots is probably simple and universal enough (if you start to allow letters people will try to use clever schemes like alpha or roman numerals, that break automation). There needs to be at least two numbers in the string to allow tracking patchlevels.

3. several such objects can be attached to a commit (a project may wish to promote a minor release to major one after it passes QA, versionning history should not be lost).

4. absent human intervention the release state of a repo is initialised at 0.0, for its first commit (tools can rely on at least one release existing in a repo).

5. a command, such as "git release", allow a human with control of the repo to set an explicit release version to a commit. Git enforces ordering (refuses versions lower than the latest repo version in git history). The most minor number of the explicit release is necessarily zero.

6. a command, such as "git release" without argument, allows a human to request setting of a minor patchlevel release version for the current commit. The computed version is:
   "last release version in git history except most minor number"
 + "."
 + "number of commits in history since this version"
(patchlevel versioning is predictable and decentralized, credits to Willy Tarreau for the idea)

7. a command, such as "git release bump", allows a human to request setting of a new non-patchlevel release version. The computed version is
   "last release version in git history except most minor number, incrementing the remaining most minor number"
 + "."
 + "0"

8. a command, such as "git release promote", allows a human to request setting a new more major release version. The computed version is
   "last release version in git history except most minor number, incrementing the next-to-remaining-most-minor-and-non-zero number, and resetting the remaining-most-minor-and-non-zero number to zero"
 + "."
 + "0"

9. a command, such as "git release cut", creates a release archive, named reponame-releaseversion.tar.xz, with a reponame-releaseversion root directory, a reponame-releaseversion/VERSION file containing releaseversion (so automation like makefiles can synchronize itself with the release version state), removing git metadata (.git tree) from the result. If the current commit has several release objects attached the highest one in ordering is chosen. If the current commit is lacking a release object a new minor patchlevel release version is autogenerated. Archive compression format can be overridden in repo config.

10. a command, such as "git release translate", outputs the commit hash associated to the version given in argument if it exists, the version associated to the commit hash given in argument if it exists, the version associated to the current commit without argument. If it is translating commit hashes with no version it outputs the various versions that could be computed for this hash by git release, git release bump, git release promote. This is necessary to bridge developer-oriented tools, that will continue to talk in commit hashes, and release/distribution/security-audit oriented tools, that want to manipulate release versions

11. when no releasing has been done in a repo for some time (I'd suggest 3 months to balance freshness with churn, it can be user-overidable in repo config), git reminds its human masters at the next commit events they should think about stabilizing state and cutting a release.

So nothing terribly complex, just a lot a small helpers to make releasing easier, less tedious, and cheaper for developers, that formalize, automate, and make easier existing practices of mature software projects, making them accessible to smaller projects. They would make releasing more predictable and reliable for people deploying the code, and easier to consume by higher-level cross-project management tools. That would transform the deployment stage of software just like Git already transformed early code writing and autotest stages.

Best regards,

-- 
Nicolas Mailhot