Re: Per-object encryption (Re: Git Merge contributor summit notes)

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 26 Mar 2018 23:22:18 +0200

On Mon, Mar 26 2018, Jonathan Nieder wrote:

> Hi Ævar,
>
> Ævar Arnfjörð Bjarmason wrote:
>
>> It occurred to me recently that once we have such a layer it could be
>> (ab)used with some relatively minor changes to do any arbitrary
>> local-to-remote object content translation, unless I've missed something
>> (but I just re-read hash-function-transition.txt now...).
>>
>> E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a
>> remote server so that you upload a GPG encrypted version of all your
>> blobs, and have your trees reference those blobs.
>
> Interesting!
>
> To be clear, this would only work with deterministic encryption.
> Normal GPG encryption would not have the round-tripping properties
> required by the design.

Right, sorry. I was being lazy. For simplicity let's say rot13 or some
other deterministic algorithm.

> If I understand correctly, it also requires both sides of the
> connection to have access to the encryption key.  Otherwise they
> cannot perform ordinary operations like revision walks.  So I'm not
> seeing a huge advantage over ordinary transport-layer encryption.
>
> That said, it's an interesting idea --- thanks for that.  I'm changing
> the subject line since otherwise there's no way I'll find this again. :)

In this specific implementation I have in mind only one side would have
the key, we'd encrypt just up to the point where the repository would
still pass fsck. But of course once we had that facility we could do any
arbitrary translation .

I.e. consider the latest commit in git.git:

    commit 90bbd502d54fe920356fa9278055dc9c9bfe9a56
    tree 5539308dc384fd11055be9d6a0cc1cce7d495150
    parent 085f5f95a2723e8f9f4d037c01db5b786355ba49
    parent d32eb83c1db7d0a8bb54fe743c6d1dd674d372c5
    author Junio C Hamano <gitster@xxxxxxxxx> 1521754611 -0700
    committer Junio C Hamano <gitster@xxxxxxxxx> 1521754611 -0700

        Sync with Git 2.16.3

With rot13 "encryption" it would be:

    commit <different hash>
    tree <different hash>
    parent <different hash>
    parent <different hash>
    author Whavb P Unznab <tvgfgre@xxxxxxxxx> 1521754611 -0700
    committer Whavb P Unznab <tvgfgre@xxxxxxxxx> 1521754611 -0700

        Flap jvgu Tvg 2.16.3

And an ls-tree on that tree hash would instead of README.md give you:

    100644 blob <different hash> ERNQZR.zq

And inspecting that blob would give you:

    # Rot13'd "Hello, World!"
    Uryyb, Jbeyq!

So obviously for the encryption use-case such a repo would leak a lot of
info compared to just uploading the fast-export version of it
periodically as one big encrypted blob to store somewhere, but the
advantage would be:

 * It's better than existing "just munge the blobs" encryption solutions
   bolted on top of git, because at least you encrypt the commit
   message, author names & filenames.

 * Since it would be a valid repo even without the key, you could use
   git hosting solutions for it, similar to checking in encrypted blobs
   in existing git repos.

 * As noted, it could be a permanent stress test on the SHA-1<->NewHash
   codepath.

   I can't think of a reason for why once we have that we couldn't add
   the equivalent of clean/smudge filters.

   We need to unpack & repack & re-hash all the stuff we send over the
   wire anyway, so we can munge it as it goes in/out as long as the same
   input values always yield the same output values.