On Wed, Aug 28, 2019 at 07:47:06PM -0400, Jeff King wrote:
On Wed, Aug 28, 2019 at 04:32:24PM -0400, Konstantin Ryabitsev wrote:
If I know that a project uses tag signing, would "git clone" followed by
"git verify-tag" be meaningful without a "git fsck" in-between? I.e. if an
attacker has control over the remote server, can they sneak in any badness
into any of the resulting files and still have the clone, checkout, and
verify-tag return success unless the repository is fsck'd before verify-tag?
It depends on your definition of badness. :)
As you know, for the Linux kernel we provide both tag signatures and
detached PGP signatures on tarballs (and the same is true for git). The
argument I hear frequently is that providing detached tarball signatures
is redundant[*] when tags are already PGP-signed, so I wanted to
double-check that all checksums are computed and matched on the client
in the process of "git checkout" and we're not just verifying a
signature of a non-verified checksum.
In other words, I needed to double-check that what we get in the end is
assurance that "all files in this repository are exactly the same as on
the developer's system at the time when they ran 'git tag -s'."
Generally, Git clients do not trust the server much at all (not only to
be no malicious, but also not to accidentally introduce bit errors).
Even without the fsck, we will compute the sha1 of each object (we must,
because the other side doesn't send it at all), and that we have all
objects reachable from the refs. So verifying the tag at that point
demonstrates a signature on the tag object, which refers to probably
some commit via sha1, which refers to actual trees and blobs by a chain
of sha1s. If you believe in the integrity of sha1, then it has
effectively signed all of that content.
So, the client will actually calculate those checksums during the
checkout stage to make sure that all content in the repository matches
the hash of the commit being checked out, correct?
If you want to analyze each object for such malformed bits before the
checkout, you can do so with "git fsck". But consider instead setting
transfer.fsckObjects to check the objects while they're being indexed by
the initial clone (i.e., having their sha1's computed). It's effectively
free to do it at that point, whereas a later fsck has to access each
object again (this takes on the order of minutes of CPU for the kernel).
I don't think there's any real safety in doing so for the case you've
described (there's no bad pattern that fsck knows about that the actual
checkout code does not). But it does give you an early warning, and is
especially help if you're not planning to check things out yourself, but
want to avoid hosting malicious repos.
Right, but it's not something end-users are going to do if they just
want to check out a repository and access code from it. The "git clone
&& git verify-tag" workflow is now used by some distros that are
packaging Github releases, and they aren't setting transfer.fsckObjects
before "git clone" starts, pretty sure.
Thanks for your help!
-K
[*] Tarball signatures may be redundant in cryptographic sense, but for
repositories like linux.git, which are now around 1.2 GB in size, it
makes significant difference whether someone downloads the full git tree
or just a highly compressed tarball that is only 100MB. I know that it's
possible to clone with --depth 1 to reduce the amount of downloaded
history, but that's hard on the servers and not something I really want
to widely advertise as a mechanism for getting the kernel. :) In
addition to that, distributing static content like tarballs is much
easier logistically than git repositories, and it's much harder to
introduce accidental corruption to a bunch of static files. Disk is
cheap, but CPU and admin time aren't.