Re: Working with git binary stream

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 9, 2021 at 9:16 AM anatoly techtonik <techtonik@xxxxxxxxx> wrote:
>
> Hi.
>
> In https://lore.kernel.org/git/CAPkN8xK7JnhatkdurEb16bC0wb+=Khd=xJ51YQUXmf2H23YCGw@xxxxxxxxxxxxxx/T/#u
> it became clear that it is impossible to make fast-export followed
> by fast-import to get identical commit hashes for the resulting
> repository (try https://github.com/simons-public/protonfixes).
> It is also impossible to detect which commits would be altered
> as a result of this operation. Because fast-export/import does
> some implicit commit normalization, fixing that probably requires
> too much effort.
>
> As an alternative it appeared that that theres is also a
> "git binary stream" log that is produced by
>
> git cat-file --batch --batch-all-objects
>
> Is there a way to reconstruct the repository given that stream?
> Is there documentation on how to read it?

Peff already responded about hash-object.  And pointed you, again, to
the manual for cat-file.

Can I suggest an alternative, even if it changes the problem statement
slightly?  For some reason you didn't like my
--reference-excluded-parents suggestion, but there's another way to do
this as well with fast-export and fast-import as they exist today: use
fast-export's --show-original-ids flag.  With that flag, you'll know
the original hashes.  And if your filtering process does not modify a
commit nor any of its ancestors, it can simply omit that commit (i.e.
not pass it along to fast-import) and replace any references to the
commit with a reference to the original hash.  So, for example if the
`git fast-export --show-original-ids ...` output looked as follows (a
simple repository with just three commits for demonstration purposes):

"""
reset refs/heads/main
commit refs/heads/main
mark :1
original-oid 81b642ea15a614e84cdd52514a963735426ab06c
author Developer Name <developer@xxxxxxxx> 1628603376 -0400
committer Developer Name <developer@xxxxxxxx> 1628603376 -0400
data 35
First commit, which was gpg signed
M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 fileA

commit refs/heads/main
mark :2
original-oid 0024a18e9bfef3fd1091305cef4dd5a789164809
author Developer Name <developer@xxxxxxxx> 1628603396 -0400
committer Developer Name <developer@xxxxxxxx> 1628603396 -0400
data 14
Second commit
from :1
M 100644 f2e41136eac73c39554dede1fd7e67b12502d577 fileA
M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 fileB

commit refs/heads/main
mark :3
original-oid 96efb1173ad5c037f03f3639976f2465b1c58186
author Developer Name <developer@xxxxxxxx> 1628603422 -0400
committer Developer Name <developer@xxxxxxxx> 1628603422 -0400
data 13
Third commit
from :2
M 100644 f15bf479158b73b9bb79e158ce93d75190bc9597 fileA
M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 fileC
"""

Then we'd parse the first commit, decide we didn't want to filter it,
note that we hadn't filtered it or any of its parents, and then decide
to replace any references to ":1" (the stream's name for the
replacement for that commit) with
"81b642ea15a614e84cdd52514a963735426ab06c" (the original hash).

Then we'd parse the second commit.  Perhaps on this one we decide we
want to remove fileB.  So we output it after removing the fileB line,
and after replacing ":1" with the appropriate hash.

Then we'd parse the third commit.  We decide we don't want to change
this one, but we did change the second commit (the one with "mark
:2"), so we still have to output it.  There are no direct references
to :1, so we don't need to update those either.

In the end, we'd pass this stream to fast-import:

"""
reset refs/heads/main
commit refs/heads/main
mark :2
original-oid 0024a18e9bfef3fd1091305cef4dd5a789164809
author Developer Name <developer@xxxxxxxx> 1628603396 -0400
committer Developer Name <developer@xxxxxxxx> 1628603396 -0400
data 14
Second commit
from 81b642ea15a614e84cdd52514a963735426ab06c
M 100644 f2e41136eac73c39554dede1fd7e67b12502d577 fileA
M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 fileB

commit refs/heads/main
mark :3
original-oid 96efb1173ad5c037f03f3639976f2465b1c58186
author Developer Name <developer@xxxxxxxx> 1628603422 -0400
committer Developer Name <developer@xxxxxxxx> 1628603422 -0400
data 13
Third commit
from :2
M 100644 f15bf479158b73b9bb79e158ce93d75190bc9597 fileA
M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 fileC
"""

and it'd recover the original commit as you wanted.

This does presume that you're importing into the original repository
(or a clone --mirror of it), because it expects certain hashes to
already exist.  And when importing into such a repo, you want to use
--force with fast-import.  But it should do what you're asking for,
without needing to do any extra work in fast-export or fast-import.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux