On Mon, Aug 9, 2021 at 9:16 AM anatoly techtonik <techtonik@xxxxxxxxx> wrote: > > Hi. > > In https://lore.kernel.org/git/CAPkN8xK7JnhatkdurEb16bC0wb+=Khd=xJ51YQUXmf2H23YCGw@xxxxxxxxxxxxxx/T/#u > it became clear that it is impossible to make fast-export followed > by fast-import to get identical commit hashes for the resulting > repository (try https://github.com/simons-public/protonfixes). > It is also impossible to detect which commits would be altered > as a result of this operation. Because fast-export/import does > some implicit commit normalization, fixing that probably requires > too much effort. > > As an alternative it appeared that that theres is also a > "git binary stream" log that is produced by > > git cat-file --batch --batch-all-objects > > Is there a way to reconstruct the repository given that stream? > Is there documentation on how to read it? Peff already responded about hash-object. And pointed you, again, to the manual for cat-file. Can I suggest an alternative, even if it changes the problem statement slightly? For some reason you didn't like my --reference-excluded-parents suggestion, but there's another way to do this as well with fast-export and fast-import as they exist today: use fast-export's --show-original-ids flag. With that flag, you'll know the original hashes. And if your filtering process does not modify a commit nor any of its ancestors, it can simply omit that commit (i.e. not pass it along to fast-import) and replace any references to the commit with a reference to the original hash. So, for example if the `git fast-export --show-original-ids ...` output looked as follows (a simple repository with just three commits for demonstration purposes): """ reset refs/heads/main commit refs/heads/main mark :1 original-oid 81b642ea15a614e84cdd52514a963735426ab06c author Developer Name <developer@xxxxxxxx> 1628603376 -0400 committer Developer Name <developer@xxxxxxxx> 1628603376 -0400 data 35 First commit, which was gpg signed M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 fileA commit refs/heads/main mark :2 original-oid 0024a18e9bfef3fd1091305cef4dd5a789164809 author Developer Name <developer@xxxxxxxx> 1628603396 -0400 committer Developer Name <developer@xxxxxxxx> 1628603396 -0400 data 14 Second commit from :1 M 100644 f2e41136eac73c39554dede1fd7e67b12502d577 fileA M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 fileB commit refs/heads/main mark :3 original-oid 96efb1173ad5c037f03f3639976f2465b1c58186 author Developer Name <developer@xxxxxxxx> 1628603422 -0400 committer Developer Name <developer@xxxxxxxx> 1628603422 -0400 data 13 Third commit from :2 M 100644 f15bf479158b73b9bb79e158ce93d75190bc9597 fileA M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 fileC """ Then we'd parse the first commit, decide we didn't want to filter it, note that we hadn't filtered it or any of its parents, and then decide to replace any references to ":1" (the stream's name for the replacement for that commit) with "81b642ea15a614e84cdd52514a963735426ab06c" (the original hash). Then we'd parse the second commit. Perhaps on this one we decide we want to remove fileB. So we output it after removing the fileB line, and after replacing ":1" with the appropriate hash. Then we'd parse the third commit. We decide we don't want to change this one, but we did change the second commit (the one with "mark :2"), so we still have to output it. There are no direct references to :1, so we don't need to update those either. In the end, we'd pass this stream to fast-import: """ reset refs/heads/main commit refs/heads/main mark :2 original-oid 0024a18e9bfef3fd1091305cef4dd5a789164809 author Developer Name <developer@xxxxxxxx> 1628603396 -0400 committer Developer Name <developer@xxxxxxxx> 1628603396 -0400 data 14 Second commit from 81b642ea15a614e84cdd52514a963735426ab06c M 100644 f2e41136eac73c39554dede1fd7e67b12502d577 fileA M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 fileB commit refs/heads/main mark :3 original-oid 96efb1173ad5c037f03f3639976f2465b1c58186 author Developer Name <developer@xxxxxxxx> 1628603422 -0400 committer Developer Name <developer@xxxxxxxx> 1628603422 -0400 data 13 Third commit from :2 M 100644 f15bf479158b73b9bb79e158ce93d75190bc9597 fileA M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 fileC """ and it'd recover the original commit as you wanted. This does presume that you're importing into the original repository (or a clone --mirror of it), because it expects certain hashes to already exist. And when importing into such a repo, you want to use --force with fast-import. But it should do what you're asking for, without needing to do any extra work in fast-export or fast-import.