On Mon, Mar 26 2018, Jonathan Nieder wrote: > Hi Ævar, > > Ævar Arnfjörð Bjarmason wrote: > >> It occurred to me recently that once we have such a layer it could be >> (ab)used with some relatively minor changes to do any arbitrary >> local-to-remote object content translation, unless I've missed something >> (but I just re-read hash-function-transition.txt now...). >> >> E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a >> remote server so that you upload a GPG encrypted version of all your >> blobs, and have your trees reference those blobs. > > Interesting! > > To be clear, this would only work with deterministic encryption. > Normal GPG encryption would not have the round-tripping properties > required by the design. Right, sorry. I was being lazy. For simplicity let's say rot13 or some other deterministic algorithm. > If I understand correctly, it also requires both sides of the > connection to have access to the encryption key. Otherwise they > cannot perform ordinary operations like revision walks. So I'm not > seeing a huge advantage over ordinary transport-layer encryption. > > That said, it's an interesting idea --- thanks for that. I'm changing > the subject line since otherwise there's no way I'll find this again. :) In this specific implementation I have in mind only one side would have the key, we'd encrypt just up to the point where the repository would still pass fsck. But of course once we had that facility we could do any arbitrary translation . I.e. consider the latest commit in git.git: commit 90bbd502d54fe920356fa9278055dc9c9bfe9a56 tree 5539308dc384fd11055be9d6a0cc1cce7d495150 parent 085f5f95a2723e8f9f4d037c01db5b786355ba49 parent d32eb83c1db7d0a8bb54fe743c6d1dd674d372c5 author Junio C Hamano <gitster@xxxxxxxxx> 1521754611 -0700 committer Junio C Hamano <gitster@xxxxxxxxx> 1521754611 -0700 Sync with Git 2.16.3 With rot13 "encryption" it would be: commit <different hash> tree <different hash> parent <different hash> parent <different hash> author Whavb P Unznab <tvgfgre@xxxxxxxxx> 1521754611 -0700 committer Whavb P Unznab <tvgfgre@xxxxxxxxx> 1521754611 -0700 Flap jvgu Tvg 2.16.3 And an ls-tree on that tree hash would instead of README.md give you: 100644 blob <different hash> ERNQZR.zq And inspecting that blob would give you: # Rot13'd "Hello, World!" Uryyb, Jbeyq! So obviously for the encryption use-case such a repo would leak a lot of info compared to just uploading the fast-export version of it periodically as one big encrypted blob to store somewhere, but the advantage would be: * It's better than existing "just munge the blobs" encryption solutions bolted on top of git, because at least you encrypt the commit message, author names & filenames. * Since it would be a valid repo even without the key, you could use git hosting solutions for it, similar to checking in encrypted blobs in existing git repos. * As noted, it could be a permanent stress test on the SHA-1<->NewHash codepath. I can't think of a reason for why once we have that we couldn't add the equivalent of clean/smudge filters. We need to unpack & repack & re-hash all the stuff we send over the wire anyway, so we can munge it as it goes in/out as long as the same input values always yield the same output values.