On Mon, Aug 9, 2021 at 8:45 AM anatoly techtonik <techtonik@xxxxxxxxx> wrote: > > On Thu, Mar 4, 2021 at 3:56 AM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > > > Johannes Sixt <j6t@xxxxxxxx> writes: > > > > > Am 02.03.21 um 22:52 schrieb anatoly techtonik: > > >> For my use case, where I just need to attach another branch in > > >> time without altering original commits in any way, `reposurgeon` > > >> can not be used. > > > > > > What do you mean by "attach another branch in time"? Because if you > > > really do not want to alter original commits in any way, perhaps you > > > only want `git fetch /the/other/repository master:the-other-one-s-master`? > > > > Yeah, I had the same impression. If a bit-for-bit identical copy of > > the original history is needed, then fetching from the original > > repository (either directly or via a bundle) would be a much simpler > > and performant way. > > The goal is to have an editable stream, which, if left without edits, would > be bit-by-bit identical, so that external tools like `reposurgeon` could > operate on that stream and be audited. There were some patches proposed some months back[1] to make fast-import allow importing signed commits...except that they unconditionally kept the signatures and didn't do any validation, which would have resulted in invalid signatures if any edits happened. I suggested adding signature verification (which would allow options like erroring out if they didn't match, or dropping signatures when they didn't match but keeping them otherwise). That'd help usecases like yours. The author wasn't interested in implementing that suggestion (and it's a low priority for me that I may never get around to). The series also wasn't pushed through and eventually was dropped. However, that wouldn't fully solve your stated goal. As already mentioned earlier in this thread, I don't think your stated goal is realistic; the only complete bit-for-bit identical representation of the repository is the original binary format. Your stated goal here, however, isn't required for solving the usecase you present. [1] https://lore.kernel.org/git/20210430232537.1131641-1-lukeshu@xxxxxxxxxxx/ > Right now, because the repository > https://github.com/simons-public/protonfixes contains a signed commit > right from the start, the simple fast-export and fast-import with git itself > fails the check. Yes, and I mentioned several other reasons why a round-trip from fast-export through fast-import cannot be relied upon to preserve object hashes. > I understand that patching `git` to add `--complete` to fast-import is > realistically beyond my coding abilities, and my only option is to parse It's more patching than that which would be required: (1) It'd be both fast-export and fast-import that would need patching, not just fast-import. (2) --complete is a bit of a misnomer too, because it's not just get-all-the-data, it's keep-the-data-in-the-original-format. If objects had modes of 040000 instead of 40000, despite meaning the same thing, you'd have to prevent canonicalization and store them as the original recorded value or you'd get a different hash. Ditto for commit messages with extra data after a NUL byte, and a variety of other possible issues. (3) fast-export works by looking for the relevant bits it knows how to export. You'd have to redesign it to fully parse every bit of data in each object it looks at, throw errors if it didn't recognize any, and make sure it exports all the bits. That might be difficult since it's hard to know how to future proof it. How do you guarantee you've printed every field in a commit struct, when that struct might gain new fields in the future? (This is especially challenging since fast-export/fast-import might not be considered core tools, or at least don't get as much attention as the "truly core" parts of git; see https://lore.kernel.org/git/xmqq36mxdnpz.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/) > the binary stream produced by `git cat-file --batch`, which I also won't > be able to do without specification. The specification is already available in the manual. Just run `git cat-file --help` to see it. Let me quote part of it for you: For example, --batch without a custom format would produce: <sha1> SP <type> SP <size> LF <contents> LF > P.S. I am resurrecting the old thread, because my problem with editing > the history of the repository with an external tool still can not be solved. Sure it can, just use fast-export's --reference-excluded-parents option and don't export commits you know you won't need to change. Or, if for some reason you are really set on exporting everything and then editing, then go ahead and create the full fast-export output, including with all your edits, and then post-process it manually before feeding to fast-import. In particular, in the post-processing step find the commits that were problematic that you know won't be modified, such as your signed commit. Then go edit that fast-export dump and (a) remove the dump of the no-longer-signed signed commit (because you don't want it), and (b) replace any references to the no-longer-signed-commit (e.g. "from :12") to instead use the hash of the actual original signed commit (e.g. "from d3d24b63446c7d06586eaa51764ff0c619113f09"). If you do that, then git fast-import will just build the new commits on the existing signed commit instead of on some new commit that is missing the signature. Technically, you can even skip step (a), as all it will do is produce an extra commit in your repository that isn't used and thus will be garbage collected later.