Re: git bundle format

Stephen Bash <bash@xxxxxxxxxxx> · Mon, 26 Nov 2012 15:56:24 -0500 (EST)

----- Original Message -----
> From: "Jason J CTR Pyeron (US)" <jason.j.pyeron.ctr@xxxxxxxx>
> Sent: Monday, November 26, 2012 2:24:54 PM
> Subject: git bundle format
> 
> I am facing a situation where I would like to use git bundle but at
> the same time inspect the contents to prevent a spillage[1].

As someone who faced a similar situation in a previous life, I'll offer my $0.02, but I'm certainly not the technical expert here.

> Given we have a public repository which was cloned on to a secret
> development repository. Now the developers do some work which should
> not be sensitive in any way and commit and push it to the secret
> repository.
> 
> Now they want to release it out to the public. The current process is
> to review the text files to ensure that there is no "secret" sauce
> in there and then approve its release. This current process ignores
> the change tracking and all non-content is lost.
> 
> In this situation we should assume that the bundle does not have any
> content which is already in the public repository, that is it has
> the minimum data to make it pass a git bundle verify from the public
> repositories point of view. We would then take the bundle and pipe
> it though the "git-bundle2text" program which would result in a
> "human" inspectable format as opposed to the packed format[2]. The
> security reviewer would then see all the information being released
> and with the help of the public repository see how the data changes
> the repository.
> 
> Am I barking up the right tree?

First, a shot out of left field: how about a patch based workflow? (similar to the mailing list, just replace email with sneakernet)  Patches are plain text and simple to review (preferable to an "opaque" binary format?).

Second, thinking about your proposed bundle-based workflow I have two questions I'd have to answer to be comfortable with the solution:

  1) Does the binary bundle contain any sensitive information?
  2) Do the diffs applied to public repo contain any sensitive data?

Question 1 seems tricky to someone who knows *nothing* about the bundle format (e.g. me).  Maybe some form of bundle2text can be vetted enough that everyone involved believes that there is no other information traveling with the bundle (if so, you're golden).  Here I have to trust other experts.  On the flip side, even if the bundle itself is polluted (or considered to be lacking proof to the contrary), if (2) is considered safe, the patching of the public repo could potentially be done on a sacrificial hard drive before pushing.

Question 2 is relatively straight forward and lead me to the patch idea.  I would:
  - Bundle the public repository
  - Init a new repo in the secure space from the public bundle
  - Fetch from the to-be-sanitized bundle into the new repo
  - Examine commits (diffs) introduced by branches in the to-be-sanitized bundle
  - Perhaps get a list of all the objects in the to-be-sanitized bundle and do a git-cat-file on each of them (if the bundle is assembled correctly it shouldn't have any unreachable objects...).  This step may be extraneous after the previous.

HTH,
Stephen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html