On Thu, Oct 31, 2019 at 02:21:18PM +0100, Olaf Hering wrote: > Am Thu, 31 Oct 2019 11:15:39 +0100 > schrieb SZEDER Gábor <szeder.dev@xxxxxxxxx>: > > > However, I don't know how to tell about the skiplist file to GitHub, > > or any other Git hosting service for that matter. > > Thanks for all the details. > > Is there a way to "replay" a git repository, so that all the commit contents > and author/committer data are preserved? I think it is more important to have > a clean repository than to preserve irrelevant commit hashes. Those commits can be fixed by simply transforming the fast-export stream, e.g.: $ git init new $ git -C virt-top/ fast-export --all | sed -e '/^\(author\|committer\) Richard W\.M\. Jones <rjones@xxxxxxxxxx> </ s/<"Richard W\.M\. Jones <rjones@xxxxxxxxxx>"> //' | git -C new fast-import BUT! All the usual warnings about rewriting already published history apply. The hash of a couple of commits from 2009 might seem irrelevant now, a decade later, but after correcting those author and committer lines the hashes of all subsequent commits will inherently change as well. This is, in general, upsetting for everyone who have cloned the repo and built their own work on top. Furthermore, some commit messages refer to older commits by their hash (e.g. in 431dbd98ba: "Simplifies and updates commit dbef8dd3bf00417e75a12c851b053e49c9e1a79e"); those references will go stale after rewriting history, unless you put in extra work to update them. I would advise against rewriting history.