On Mon, Aug 09, 2021 at 07:12:13PM +0300, anatoly techtonik wrote: > As an alternative it appeared that that theres is also a > "git binary stream" log that is produced by > > git cat-file --batch --batch-all-objects > > Is there a way to reconstruct the repository given that stream? Yes, though it is probably not the easiest way to do so. Just dumping all of the object contents back into another repository will indeed give you the same hashes, etc. But if you change one object, then all its hash will change, and all of the other objects pointing to it will need to change, etc. And that dump is in apparently-random order with respect to the actual graph structure and relationship between objects. You'd probably do better to build a tool around rev-list, and only use cat-file to fetch the verbatim object contents. At some point your tool would start to look a lot like fast-export/fast-import, and it may be less work to teach them whatever features you need to avoid any normalization (e.g., retaining signatures, encodings, etc). > Is there documentation on how to read it? The output format is described in the "BATCH FORMAT" section of "git help cat-file". Basically you get each object id, type, and size in bytes, followed by the object contents. You can use the size from the header to know how many bytes to read. There's no tool to accept the whole stream. You'd have to parse each entry and feed it to "git hash-object" with the appropriate type. Having a mode to hash-object to read in a bunch of objects in "cat-file --batch" format wouldn't be unreasonable, but nobody has found a need for it so far. It would also be quite slow (it writes out individual loose objects, whereas something like fast-import writes out a packfile, including at least a basic attempt at deltas). -Peff