Re: Feature request: provide a persistent IDs on a commit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2022-07-18 at 20:50 +0200, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Jul 18 2022, Stephen Finucane wrote:
> 
> > ...to track evolution of a patch through time.
> > 
> > tl;dr: How hard would it be to retrofit an 'ChangeID' concept à la the 'Change-
> > ID' trailer used by Gerrit into git core?
> > 
> > Firstly, apologies in advance if this is the wrong forum to post a feature
> > request. I help maintain the Patchwork project [1], which a web-based tool that
> > provides a mechanism to track the state of patches submitted to a mailing list
> > and make sure stuff doesn't slip through the crack. One of our long-term goals
> > has been to track the evolution of an individual patch through multiple
> > revisions. This is surprisingly hard goal because oftentimes there isn't a whole
> > lot to work with. One can try to guess whether things are the same by inspecting
> > the metadata of the commit (subject, author, commit message, and the diff
> > itself) but each of these metadata items are subject to arbitrary changes and
> > are therefore fallible.
> > 
> > One of the mechanisms I've seen used to address this is the 'Change-ID' trailer
> > used by Gerrit. For anyone that hasn't seen this, the Gerrit server provides a
> > git commit hook that you can install locally. When installed, this appends a
> > 'Change-ID' trailer to each and every commit message. In this way, the evolution
> > of a patch (or a "change", in Gerrit parlance) can be tracked through time since
> > the Change ID provides an authoritative answer to the question "is this still
> > the same patch". Unfortunately, there are still some obvious downside to this
> > approach. Not only does this additional trailer clutter your commit messages but
> > it's also something the user must install themselves. While Gerrit can insist
> > that this is installed before pushing a change, this isn't an option for any of
> > the common forges nor is it something git-send-email supports.
> 
> git format-patch+send-email will send your trailers along as-is, how
> doesn't it support Change-Id. Does it need some support that any other
> made-up trailer doesn't?

It supports sending the trailers, sure. What it doesn't support is insisting you
send this specific trailer (Change-Id). Only Gerrit can do this (server side,
thankfully, which means you don't need to ask all contributors to install this
hook if you want to rely on it for tooling, CI, etc.).

> > I imagine most people working with mailing list based workflows have their own
> > client side tooling to support this while software forges like GitHub and GitLab
> > simply don't bother tracking version history between individual commits in a
> > pull/merge request.
> 
> It's far from ideal, but at least GitLab shows a diff on a push to a MR,
> including if it's force-pushed. I'm not sure about GitHub.

GitHub does not. Simply piling multiple additional "fix" commits onto the PR
branch results in a less horrible review experience since you can maintain
context, alas at the cost of a rotten git log. We don't need to debate the pros
and cons of the various forges though :)

> 
> > IMO though, it would be fantastic if third party tools
> > weren't necessary though. What I suspect we want is a persistent ID (or rather
> > UUID) that never changes regardless of how many times a patch is cherry-picked,
> > rebased, or otherwise modified, similar to the Author and AuthorDate fields.
> > Like Author and AuthorDate, it would be part of the core git commit metadata
> > rather than something in the commit message like Signed-Off-By or Change-ID.
> > 
> > Has such an idea ever been explored? Is it even possible? Would it be broadly
> > useful?
> 
> This has come up a bunch of times. I think that the thing git itself
> should be doing is to lean into the same notion that we use for tracking
> renames. I.e. we don't, we analyze history after-the-fact and spot the
> renames for you.

Any idea where I'd find previous discussions on this? I did look, and the only
proposal I found was an old one that seemed to suggest including the Change-Id
commit-msg hook with git itself which is not what I'm suggesting here.

> We have some of that in git already, as git-patch-id, and more recently
> git-range-diff. Both are flawed in a bunch of ways, and it's easy to run
> into edge cases where they don't spot something that they "should"
> have. Where "should" exists in the mind of the user.

That's a fair point and is of course what we (Patchwork) have to do currently.
Patchwork can track relations between individual patches but doesn't attempt to
generate these relations itself. Instead, we rely on third-party tooling. The
PaStA tool was one such example of a tool that could do this [1]. I can't
imagine a tool like Gerrit would ever work without this concept of an
authoritative (and arbitrary) identifier to track a patch's identity through
time, hence its reliance on the Change-Id trailer.

Perhaps we could flip this on its head. What would be the _downsides_ of
providing a persistent, arbitrary identifier on a commit similar to Author and
AuthorDate fields? There's obviously some work involved in implementing it but
assuming that was already done, what would break/be worse as a result?

Stephen

[1] https://rsarky.github.io/2020/08/10/pasta-patchwork.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux