Hi, On Tue, Mar 21, 2023 at 4:21 AM Lundkvist Per <per.lundkvist@xxxxxxxxxxxxx> wrote: > > Hi, > > We are investigating adding commit and tag signatures into our existing > repositories. We currently use the common workflow of developers merging commits > to master using an internal git hosting service after having passed code > review. Non-local merges like this would then be unsigned. > > But it seems like if we allow unsigned empty merge commits, i.e. those that > themselves do not introduce any any other change than what its parents > introduce, and require all other commits to be properly validated, then we can > safely validate the whole repository? What does "validate" mean in this context? Junio asked a bit about this. An alternative he didn't bring up is perhaps you are trying to protect against supply chain attacks, with a scenario such as someone gains remote access to some computers on your system, but not to any gpg keys (because developers are using gpgs on yubikey or something like that). In such a case, you might ask the question of whether you can determine if such an attacker has inserted additional commits to your history. You can use gpg-signed commits by known gpg-keys to rule out most commits as being from a potential attacker, but odds are your git hosting service does merges but doesn't sign them. You want a way to differentiate between merges it makes and ones an attacker might sneak in. And you are trying to equate "unsigned empty merge commit" with "was a clean merge anyway, and we're not worried about any permutation of clean merges of signed code". Or maybe you mean something completely different by "validate". Junio brushed over whether you could assume "unsigned empty merge commit" is equivalent to "was a clean merge", so I'll focus on that. You often can make such an assumption, but it's not valid in general. > A simple naive example of this would look something like this: > > rc=0 > tags=$(git for-each-ref --format '%(objectname)' refs/tags) > tags_verified=$(for i in $tags; do git verify-tag --format='%(objectname)' "$i"; done) > > for i in $(git rev-list HEAD --no-merges --not $tags_verified); do > git verify-commit "$i" || rc=1 > done > > for i in $(git rev-list HEAD --merges --not $tags_verified); do > diff=$(git show --text --pretty=format: --diff-merges=cc "$i") > git verify-commit "$i" || [ ! "$diff" ] || rc=1 > done > > exit $rc > > Or is this a faulty strategy? It's faulty, but it'd only rarely trip you up. First off, I would use --remerge-diff over --cc. --remerge-diff was designed to show whether the user made changes relative to what a current automatic remerge of the parents would give (so it shows whether and how they resolved conflicts and if and what other changes they added in). --cc comes close when you only want to know if the merge was clean, but still isn't quite the same. One example I can think of is that if there is a modify/delete conflict that the user resolves in favor of keeping the file, then the merge obviously wasn't clean. But $(git show --pretty=format: --cc $i) will come back empty (leading you to think the merge was clean) while $(git show --pretty=format: --remerge-diff $i) will show there was a modify/delete conflict. Second, even with remerge-diff, it is only checking whether a merge with today's merge algorithm and config settings would be clean. Changing diff.algorithm could in rare cases affect whether commits can be merged cleanly -- and the default in the past was "myers" whereas nowadays for merge specifically it is "histogram". Also, there have been changes to both of these algorithms in the past (one notable example being the introduction of diff.indentHeuristic and later turning it on by default), and there may be more such changes in the future. Similarly, merge.directoryRenames was essentially "false" before it was introduced, then was "true" for a while, then defaulted to "conflict". Someone could have made a merge with an old version of git that used either "false" or "true" for merge.directoryRenames and have it be clean, but when you go to remerge the same commits today you get a conflict due to that variable defaulting to "conflict". Further, the person running this script may have various `diff` or `merge` config settings globally that differ from those used by whoever or whatever did the past merges. And, you have to account for the fact that the merge might have been made with something other than git. GitHub and GitLab used to use libgit2, which had an entirely different merge algorithm. Gerrit uses jgit, which has its own merge algorithm. Realistically, there isn't that much difference between all these algorithms. For nearly all merges, any of these merge algorithms with any of these options will give the exact same answer...but "nearly all" != "all". So, there is no guarantee. And we can't and won't even guarantee it going forward even if you stick with Git and somehow ensure using the same config settings either, because we may make further changes to diff and merge algorithms in the future, which may affect the "clean remergeability" (or whatever you call it) of merges you make today. All that said, the odds that you discover a particular merge in the wild where any of this matters to your usecase is probably small. So you might get away with it...but if you try it, you should probably add some good comments to the code for whoever comes along and investigates a "bug" in the future, to let them know that false negatives and false positives are possible with your checks, but that you just assumed they would be rare enough that you decided to ignore them. Finally, just a couple preference questions: I'd have used [ -z "$diff" ] rather than [ ! "$diff" ], and a simple "exit 1" rather than "rc=1" to exit early, but maybe you really want all the output. But if you do want the output, would it make more sense to have the [[ -z "$diff"]] before the git verify-commit "$i"? Also, I'd be tempted to use "git log --format='%h %G? %s' $RANGE" rather than call git verify-commit N times, and then post-process all the "N" cases, but what you have works too.