Re: encrypted repositories?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 20.07.2009, 17:30 Uhr, schrieb Jeff King <peff@xxxxxxxx>:

On Mon, Jul 20, 2009 at 02:09:28PM +0200, Matthias Andree wrote:

No, the server can't be allowed access to the keys or decrypted data.

I'm not sure about the graph, and if I should be concerned. Exposing
the DAG might be in order.

It would be ok if the disk storage and the over-the-wire format
cannot use delta compression then. It would suffice to just send a
set of objects efficiently - and perhaps smaller revisions can be
delta-compressed by the clients when pushing.

The problem is that you need to expose not just the DAG, but also the
hashes of trees and blobs. Because if I know you have master^, and I want
to send you master, then I need to know which objects are referenced by
master that are not referenced by master^.

Yes, you need to know that. Not all of the push logic needs to be implemented on the server though.

In my scenario, the server degenerates into sort of a general object store - I really don't expect much smartness there. What is easily available (clients providing deltas rather than full objects) could be exploited, and that's it.

We can always have two local repositories, one reference and one checkout. The reference is a decrypted (unencrypted) copy of the set of objects on the server, and I could use that for tracking the server-side view (for instance, what are master^ and master pointing at so I can derive git rev-list master^...master, what do I need to send to the server).

I'm well aware that crypto requires more efforts on the client side if we don't trust the server, that's just natural.

The question is: which VCS can serve my scenario?

So now you have security implications, because I can do an offline
guessing attack against your files (i.e., calculate git blob hashes for
likely candidates and see if you have them). Whether that is a problem
really depends on your data.

Or look at commit frequency and push sources. There's always a leak of information even if I just upload a series of blah-2009MMDD-NNN.tar.lzma.gpg files... The data is going to be obsolete, say, 3 months; students then write the exam and then it's sort of public anyways. Even if your model does not entail not publishing exams (as opposed to embargoed press releases under development), but you can't prevent someone from writing their recollection of the problems from memory afterwards and sharing it with other students.

Not to mention that it makes the protocol a lot more complex, as you
would be encrypting _parts_ of objects, like the filenames of a tree,
and the commit message of a commit object.

I suppose in theory you could obfuscate the sha1's in a way that
preserved the object relationships but revealed no information. That is,
the server would have one "fake" set of sha1's, and the client would map
its real sha1's to the fake ones when talking with the server. But that
is again potentially getting complex.

Is your concern that the object name (SHA1) is derived from the unencrypted version?

--
Matthias Andree
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]