Re: Is the sha256 object format experimental or not?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10.05.2021 22:42, brian m. carlson wrote:
Almost nobody is using it because the main forges don't yet support it,
because it's going to be just as much work to support it there as it has
been in Git.  We won't be making it easier by making deliberately
incompatible changes when we don't have to.

I know that you said there is no reason to make a breaking change to the
SHA256 implementation now, but because of what you say above, I think we
still have the opportunity to make breaking changes. In any case I think
we only need to make one breaking change to gain algorithmic agility
going forward and avoid painful, multi-year transitions like the one
you've been executing.

My project to add universal cryptographic signing to Git by using a
standard protocol and generalized configuration to support any
cryptographic signing scheme could also apply to the digests as well,
and I think it should. If object digests in Git were self-describing
(i.e. they contain an algorithm identifier as well as the digest) then
repos gain "algorithmic agility" and can change algorithms at any time
to keep up as algorithms grow stale and are replaced.

I think Git should externalize the calculation of object digests just
like it externalizes the calcualtion of object digital signatures.
Cryptography is very difficult to get correct and the dedicated tools
for that (e.g. OpenSSH, OpenSSL, GnuPG, etc) get lots of scrutiny and
have the best chance of getting it right. I don't think Git should try
to do cryptography at all.

Object digests should just be names for objects; Git doesn't really need to
know anything more than "is this the name for that object?". Answering
that question can, and should, be done by an external tool that is
implemented correctly and hardened against attack. I think the only
counter-argument for this approach is performace related. Pipe-forking a
child process and reading/writing over IPC pipes is expensive in terms
of context switching and process setup/teardown but there are a number
of mitigations I won't go into here.

I think we should make one last breaking change for digests and not go
with the existing SHA-256 implementation but instead switch to
self-describing digests and digital signatures and rely on external
tools that Git talks to using a standard protocol. We can maintain full
backward compatibility and even support full round tripping using some
of the similar techniques that Brian came up with. A transitional
half-old/half-new signed tag could look like:

```
object 04b871796dc0420f8e7561a895b52484b701d51a
obj 0ED_zgYrQg584bCrqKPoUvxaQ5aMis0GtnW_NrZFTTxUlHLUOyp77LanoZEGV6ajhYGLGTaTfCIQhryovyeNFJuG
type commit
tag signedtag
tagger C O Mitter <committer@xxxxxxxxxxx> 1465981006 +0000
signtype openpgp
sign LS0tLS1CRUdJTiBQR1AgU0lHTkFUVVJFLS0tLS0KVmVyc2lvbjogR251UEcgdjEKCmlRRWN
CQUFCQWdBR0JRSlhZUmhPQUFvSkVHRUpMb1czSW5HSmtsa0lBSWNuaEw3UndFYi8rUWVYOWVua1
hoeG4KcnhmZHFydldkMUs4MHNsMlRPdDhCZy9OWXdyVUJ3L1JXSitzZy9oaEhwNFd0dkUxSERHS
GxrRXozeTExTGt1aAo4dFN4UzNxS1R4WFVHb3p5UEd1RTkwc0pmRXhoWmxXNGtuSVExd3QveVdx
TSszM0U5cE40aHpQcUx3eXJkb2RzCnE4RldFcVBQVWJTSlhvTWJSUHcwNFM1anJMdFpTc1VXYlJ
Zam1KQ0h6bGhTZkZXVzRlRmQzN3VxdUlhTFVCUzAKcmtDM0pyeDc0MjBqa0lwZ0ZjVEkyczYwdW
hTUUx6Z2NDd2RBMnVrU1lJUm5qZy96RGtqOCszaC9HYVJPSjcyeApsWnlJNkhXaXhLSmtXdzhsR
TlhQU9EOVRtVFc5c0ZKd2NWQXptQXVGWDJrVXJlRFVLTVpkdUdjb1JZR3BEN0U9Cj1qcFhhCi0t
LS0tRU5EIFBHUCBTSUdOQVRVUkUtLS0tLQo

signed tag

signed tag message body
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
=jpXa
-----END PGP SIGNATURE-----
```

I think a good move to make right now would be to add a general function
for stripping out any number of named fields from objects and also
stripping out in-body signatures found in tags. That way we can add
support in today's Git for stripping out fields/data for things like
creating/verifying the object digest and/or digital signature.

BTW, in the example above the 'obj' field is a self-describing, URL-safe
Base64 encoded Blake2b-512 digest encoded using the format described
[here][1]. The starting '0E' Base64 characters identify the digest as
Blake2b-512 and also specify that the length of the digest is 64-bytes.
If you Base64 decode the 'obj' field value you get 66 bytes, the digest
value is the last 64 bytes of the 66 bytes.

By going with self-describing digests we can have configuration files
that contain 'program' and 'options.*' for each external tool that can
create/validate digests of each type. So in this case there would be
something like:

```
[digest "blake2b"]
 program = "blake2bsum"
[digest "blake2b.options"]
 length = 64
```

Using self-describing cryptographic constructs for digests and
signatures and relying on external tools makes it trivial for Git to
walk the object graph and enumerate all of the digest types and
signature types in a given repo and determine if a user has their
configuration set up correctly to work with that repo. Projects can
declare which types they are using and recommend tools to use for those
types.

Cheers!
Dave

TL;DR

Let me try to lay out the case for making a breaking change to sha256
right now that will future-proof repos going forward.

It has been known for a few decades now that cryptography has a
shelf-life. By that I mean as technology and cryptanalisys improves we
have had to make keys larger and invent new algorithms that resist the
new attacks on cryptography. This has been true digest algorithms (i.e.
hashes), digital signatures (i.e. non-repudiation), and encryption (i.e.
confidentiality). The relevant case here is the fact that sha256 is
vulnerable to extension attacks and cryptographers have lost some
confidence in it after many Davies-Meyer (DM) structure and ARX network
designs based on MD4 were broken 20 years ago. SHA-256 uses DM plus a
block cipher based on an ARX network. The end result is that in high
security software, SHA-256 is being replaced with SHA-3 and Blake2
digests.

Another key thing to think about is that a git repo is a form of a
provenance log that could become the primary tool for securing the
software supply chain if we were to make some careful, well thought out
changes arond the digests and digital signatures. What changes exactly?

1. upgrade the digests to something cryptographically secure.
2. digitally sign all commits/merges/tags using...
3. key material tracked with cryptographically secure provenance logs
inside of the repo itself.
4. switch to "late binding", "self describing" cryptographic constructs.

Let me go over these and describe how these fit together.

1. SHA-1 is not cryptographically secure and SHA-256 is already not
  being used in *new* systems and is being replaced in existing, high
  security systems. I think Git should move to more secure digest
  algorithms because the hashes in Git repos are used as naming
  identifiers for Git objects which gives them a higher security
  burden.

2. Digitally signing all commits/merges/tags is critical to tie
  contributions to contributors in a non-repudiable fashion. At the
  very least it is a more secure solution for S-o-b but it also opens
  up the possibility for cryptographically secure accountability. Banks
  and governments are already doing know-your-customer (KYC)
  verifications of identity that can be used to identify contributors
  and their contributions cryptographically. If privacy is a concern,
  zero-knowledge proofs, based on the KYC authentic data, can be used
  to create pseudonymous identities for contributors that can be linked
  to their real-world identity under judicial order. Essentially a
  developer can say, "you don't need to know my real world identity but
  here's proof that XYZ bank knows who I am and here is a large random
  number you can use to de-anonymize me with the help of a court if
  needed"

3. The key material used for identifying contributors needs to move into
  the repos themselves for many reasons but the most important two
  reasons are (1) the repo comes with all of the data necessary to
  verify all of the digital signatures (i.e. solving the PKI problem
  for a project) and (2) to track the provenance of the public keys and
  other related data that each contributor uses. If Git repos contain
  provenance logs that are controlled and maintained by each
  contributor, those logs can also contain digital signatures over the
  code of conduct and the developer certificate of origin and other
  governing documents for a project that are legally binding (i.e.
  follow eIDAS and other legal digital signature rules). Solving the
  PKI problem alone makes digitally signing commits infinitely more
  useful and will drive adoption. Solving the non-repudiable provenance
  problem is the raison d'être of organizations like the Linux
  Foundation. I think Git should align itself with where technology is
  heading on that front.

4. Currently Git uses "early-binding" for all cryptographic material.
  The digest algorithm is hard coded (SHA-1) and the new SHA-256 is as
  well. The digital signature algorithm is also hard coded as either
  GPG or GPGSM. Early-binding makes it very difficult to plan for the
  obsolescense of cryptographic algorithms. The solution is to move to
  "late-binding"/"self-describing" cryptographic constructs. If Git
  were to switch to self-describing digests and digital signatures,
  then Git could be entirely agnostic to cryptography and rely entirely
  upon external crytpographic tools for creating/verifying digests and
  digital signatures. Instead of the direction we're taking on the
  SHA256 changeover, I think Git should switch to self-describing
  digests and digital signatures and use a standard protocol for
  talking to external cryptographic tools instead of trying to get
  cryptography correct in its code.

  Secure Scuttlebutt uses late-binding constructs that contain a type
  "sigil", Base-64 encoded key/digest/blob followed by an algorithm
  decriptor (e.g. ".sha256" or ".ed25519"). Other examples exist such
  as the Multihash encoding scheme for self-describing hashes. All of
  my work on secure provenance logs uses the emerging consensus
  encoding described [here][1]. It uses Base64 encoded cryptographic
  data and it fills what would be the padding bytes with type
  identifiers. I'm not the only one thinking along these lines. The
  [KERI project][2] at the Decentralized Identity Foundation as well as
  [Konstantin][3].


[1]: https://github.com/decentralized-identity/keri/blob/master/kids/kid0001.md
[2]: https://identity.foundation/working-groups/keri.html
[3]: https://people.kernel.org/monsieuricon/patches-carved-into-developer-sigchains



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux