Re: Round-tripping fast-export/import changes commit hashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 11, 2022 at 10:30 AM anatoly techtonik <techtonik@xxxxxxxxx> wrote:
>
> On Tue, Aug 10, 2021 at 8:58 PM Elijah Newren <newren@xxxxxxxxx> wrote:
> >
> > On Tue, Aug 10, 2021 at 8:51 AM anatoly techtonik <techtonik@xxxxxxxxx> wrote:
> > >
> > > On Mon, Aug 9, 2021 at 9:15 PM Elijah Newren <newren@xxxxxxxxx> wrote:
> > > >
> >
> > [2] https://lore.kernel.org/git/CABPp-BH4dcsW52immJpTjgY5LjaVfKrY9MaUOnKT3byi2tBPpg@xxxxxxxxxxxxxx/
> >
> > Signed commits is just one issue, and you'll have to add special code
> > to handle a bunch of other special cases if you go down this route.
> > I'd rephrase the problem.  You want to know when _your tool_ (e.g.
> > reposurgeon since you refer to it multiple times; I'm guessing you're
> > contributing to it?) has not modified a commit or any of its
> > ancestors, and when it hasn't, then _your tool_ should remove that
> > commit from the fast-export stream and replace any references to it by
> > the original commit's object id.  I outlined how to do this in [2],
> > referenced above, making use of the --show-original-ids flag to
> > fast-export.  If you do that, then for any commits which you haven't
> > modified (including not modifying any of its ancestors), then you'll
> > keep the same commits as-is with no stripping of gpg-signatures or
> > canonicalization of objects, so that you'll have the exact same commit
> > IDs.  Further, you can do this today, without any changes to git
> > fast-export or git fast-import.
>
> Took me a while to process the reply. Let's recap.
>
> I want to make a roundtrip export/import of
> https://github.com/simons-public/protonfixes which should get exactly
> the same repository.

As I've stated a few times in the thread, this request of yours is
simply impossible for general repositories ([1] contains the best
summary of the reasons).  For the specific repository in question, the
only relevant roadblocker is the presence of a signed commit which
happens to be a root commit.  That opens the door to some workarounds
that could be used with this specific repository.

[1] https://lore.kernel.org/git/CABPp-BGDB6jj+Et44D6D22KXprB89dNpyS_AAu3E8vOCtVaW1A@xxxxxxxxxxxxxx/

I provided two workarounds you could try to use for your specific case
at [2] and [3], one of which you ask about below.

[2] https://lore.kernel.org/git/CABPp-BE=9wzF6_VypoR-uEPHsLWdV7zyE13FOgLK0h8NOcMz3g@xxxxxxxxxxxxxx/
[3]  https://lore.kernel.org/git/CABPp-BH4dcsW52immJpTjgY5LjaVfKrY9MaUOnKT3byi2tBPpg@xxxxxxxxxxxxxx/

> # --- fast-export to exported.txt
> git clone https://github.com/simons-public/protonfixes
> git -C protonfixes fast-export --all > exported.txt
> # --- check revision of the repo
> git -C protonfixes rev-parse HEAD
> # 681411ba8ceb5d2d790e674eb7a5b98951d426e6
>
> # --- fast-import into new repo
> git init newrepo
> git -C newrepo fast-import < exported.txt
> # --- checking revision of the new repo
> git -C newrepo rev-parse HEAD
> # 9888762d7857d9721f0c354e7fc187a199754a4b
>
> Hashes don't match. The roundtrip fails.

As expected, given that one of your commits is signed.

> Let's see if --reference-excluded-parents helps.
>
> # --- export below produces the same export stream as above
> git -C protonfixes fast-export --reference-excluded-parents --all >
> exported_parents.txt

--reference-excluded-parents only has effect if there are excluded
parents.  You didn't exclude any parents, so obviously adding this
flag isn't going to change anything.  You should instead first
clone/fetch the part of history up to the commits you want to keep
intact (e.g. the signed commits), and then run a command like
   git -C protonfixes fast-export --reference-excluded-parents
^${BASECOMMIT1} ^${BASECOMMIT2} ^${BASECOMMITN} --all
>exported_only_newer_history.txt | git -C newrepo fast-import

Note that the examples I gave you (e.g. [2] above) all used some
excluded references (e.g. "^master~5").

> Because fast-import/fast-export don't work

You have not yet identified a bug in either, so I disagree with this comment.

>, you propose to keep the old
> repo around until it is clear which commits I am going to modify.


This statement framing looks really weird to me.  You have posed your
problem in the form of doing some kind of export/import operation,
which is fine.  However, in order to do an export operation, you
obviously need the repository in order to export it.  So why are you
calling out that you keep the repo around until you run the
fast-export command?

Anyway, that aside...

I was just saying that
  (1) signed commits exist as a method to ensure to other users that
the commits have not been modified
  (2) fast-export and fast-import exist to allow you to modify history
in some fashion (and are separate steps so people can edit the stream
between running the two commands)
  (3) the above two imply that if you still want users to be able to
verify the signed commits, that signed commits should NOT be sent
through fast-export and fast-import
  (4) therefore, if you want the signed commits kept as-is, you should
simply fetch the history up to and including those, and only send the
remainder of the history through fast-export/fast-import.

But I will add here one additional thing:

If you're weaving repositories together, that likely changes the
parent(s) of some of the commits.  Once you change the parent(s) of a
commit, that alone changes the commit and invalidates any signature it
has.  In your case you seem to only have a root commit that is signed,
and if you keep that signed commit as a root commit, you can avoid
this problem.  But, in general, if signed commits are involved in the
weaving such that they gain new parents, then what you want to do is
simply impossible; you will not be able to keep the signatures in such
a case (and the commit ids will change as well).

> Then
> make a new fast-export starting from the first commit I am going to
> modify with --reference-excluded-parents flag. Is that correct so far?

You have the basic idea, but you are making things excessively complex
with one detail here -- it does not need to start with the first
commit you are going to modify; it can start earlier.  You can simply
export all commits after the one(s) you know you don't want to change.
For example, if the history looks like this:

A---B---C---D---E---F

and commits A and B are the only signed commits (which you want to
preserve) and commit D is the first one you are going to modify, you
could still run fast-export on "^A ^B F" (i.e. C, D, E, and F in this
case) -- that will also include C, but C isn't signed and round-trips
without problems, so it doesn't hurt to include it.

> Then given this partial export and old repo, how to init the new repo
> that fast-import can apply its tail there?

Flag the signed commit(s) with a branch or branches of some sort, then
fetch just those branches into the new repo.

> What if I have multiple commits that I modify, but I don't know which
> of their parents was first?

I wouldn't bother trying to figure out which one(s) is/are first.  (I
mean, you could do some revision walking to figure that out, in which
case you'd have to fetch more than just the history of the signed
commits you want to keep but everything prior to whatever first
commit(s) you want to modify.)

Instead, I'd just do the easier thing I noted above -- use the signed
commits as exclusion markers.

> And when I touch commits from different
> branches, how to recreate their parent history intact in one repo?

Place temporary branches pointing directly to each of the signed
commits you want to keep intact (which also implies you are keeping
all the history behind those commits intact as well), then run:

git -C newrepo fetch PATH_OR_URL_OF_OLD_REPO ${TEMPBRANCH1}
${TEMPBRANCH2} ${TEMPBRANCHN}

Then use the earlier suggestion of

git -C protonfixes fast-export --reference-excluded-parents
^${TEMPBRANCH1} ^${TEMPBRANCH2} ^${TEMPBRANCHN} --all
>exported_only_newer_history.txt | git -C newrepo fast-import

to get the remainder of the history exported/imported.



I will also add that since you are interested in attempting to
round-trip through fast-export/fast-import and still end up with the
same hashes (ignoring a few fundamental shortcomings mentioned earlier
in this thread that won't always permit this to work), you can at
least get closer by adding "--reencode=no" to fast-export (so that it
doesn't alter commit messages) and setting core.ignorecase=false for
at least the fast-import invocation (so that fast-import doesn't make
files which differ in case only clob each other while importing).
But, again, that only addresses like two issues out of half a dozen.
Again, see the link at [1] earlier in this email.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux