Re: Round-tripping fast-export/import changes commit hashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 9, 2021 at 8:45 AM anatoly techtonik <techtonik@xxxxxxxxx> wrote:
>
> On Thu, Mar 4, 2021 at 3:56 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> >
> > Johannes Sixt <j6t@xxxxxxxx> writes:
> >
> > > Am 02.03.21 um 22:52 schrieb anatoly techtonik:
> > >> For my use case, where I just need to attach another branch in
> > >> time without altering original commits in any way, `reposurgeon`
> > >> can not be used.
> > >
> > > What do you mean by "attach another branch in time"? Because if you
> > > really do not want to alter original commits in any way, perhaps you
> > > only want `git fetch /the/other/repository master:the-other-one-s-master`?
> >
> > Yeah, I had the same impression.  If a bit-for-bit identical copy of
> > the original history is needed, then fetching from the original
> > repository (either directly or via a bundle) would be a much simpler
> > and performant way.
>
> The goal is to have an editable stream, which, if left without edits, would
> be bit-by-bit identical, so that external tools like `reposurgeon` could
> operate on that stream and be audited.

There were some patches proposed some months back[1] to make
fast-import allow importing signed commits...except that they
unconditionally kept the signatures and didn't do any validation,
which would have resulted in invalid signatures if any edits happened.
I suggested adding signature verification (which would allow options
like erroring out if they didn't match, or dropping signatures when
they didn't match but keeping them otherwise).  That'd help usecases
like yours.  The author wasn't interested in implementing that
suggestion (and it's a low priority for me that I may never get around
to).  The series also wasn't pushed through and eventually was
dropped.

However, that wouldn't fully solve your stated goal.  As already
mentioned earlier in this thread, I don't think your stated goal is
realistic; the only complete bit-for-bit identical representation of
the repository is the original binary format.

Your stated goal here, however, isn't required for solving the usecase
you present.

[1] https://lore.kernel.org/git/20210430232537.1131641-1-lukeshu@xxxxxxxxxxx/

> Right now, because the repository
> https://github.com/simons-public/protonfixes contains a signed commit
> right from the start, the simple fast-export and fast-import with git itself
> fails the check.

Yes, and I mentioned several other reasons why a round-trip from
fast-export through fast-import cannot be relied upon to preserve
object hashes.

> I understand that patching `git` to add `--complete` to fast-import is
> realistically beyond my coding abilities, and my only option is to parse

It's more patching than that which would be required:
(1) It'd be both fast-export and fast-import that would need patching,
not just fast-import.
(2) --complete is a bit of a misnomer too, because it's not just
get-all-the-data, it's keep-the-data-in-the-original-format.  If
objects had modes of 040000 instead of 40000, despite meaning the same
thing, you'd have to prevent canonicalization and store them as the
original recorded value or you'd get a different hash.  Ditto for
commit messages with extra data after a NUL byte, and a variety of
other possible issues.
(3) fast-export works by looking for the relevant bits it knows how to
export.  You'd have to redesign it to fully parse every bit of data in
each object it looks at, throw errors if it didn't recognize any, and
make sure it exports all the bits.  That might be difficult since it's
hard to know how to future proof it.  How do you guarantee you've
printed every field in a commit struct, when that struct might gain
new fields in the future?  (This is especially challenging since
fast-export/fast-import might not be considered core tools, or at
least don't get as much attention as the "truly core" parts of git;
see https://lore.kernel.org/git/xmqq36mxdnpz.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/)

> the binary stream produced by `git cat-file --batch`, which I also won't
> be able to do without specification.

The specification is already available in the manual.  Just run `git
cat-file --help` to see it.  Let me quote part of it for you:

       For example, --batch without a custom format would produce:

           <sha1> SP <type> SP <size> LF
           <contents> LF

> P.S. I am resurrecting the old thread, because my problem with editing
> the history of the repository with an external tool still can not be solved.

Sure it can, just use fast-export's --reference-excluded-parents
option and don't export commits you know you won't need to change.

Or, if for some reason you are really set on exporting everything and
then editing, then go ahead and create the full fast-export output,
including with all your edits, and then post-process it manually
before feeding to fast-import.  In particular, in the post-processing
step find the commits that were problematic that you know won't be
modified, such as your signed commit.  Then go edit that fast-export
dump and (a) remove the dump of the no-longer-signed signed commit
(because you don't want it), and (b) replace any references to the
no-longer-signed-commit (e.g. "from :12") to instead use the hash of
the actual original signed commit (e.g. "from
d3d24b63446c7d06586eaa51764ff0c619113f09").  If you do that, then git
fast-import will just build the new commits on the existing signed
commit instead of on some new commit that is missing the signature.
Technically, you can even skip step (a), as all it will do is produce
an extra commit in your repository that isn't used and thus will be
garbage collected later.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux