Re: propagating repo corruption across clone

Jeff Mitchell <jeffrey.mitchell@xxxxxxxxx> · Mon, 25 Mar 2013 09:43:23 -0400

On Sun, Mar 24, 2013 at 3:23 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Sun, Mar 24, 2013 at 08:01:33PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> On Sun, Mar 24, 2013 at 7:31 PM, Jeff King <peff@xxxxxxxx> wrote:
>> >
>> > I don't have details on the KDE corruption, or why it wasn't detected
>> > (if it was one of the cases I mentioned above, or a more subtle issue).
>>
>> One thing worth mentioning is this part of the article:
>>
>> "Originally, mirrored clones were in fact not used, but non-mirrored
>> clones on the anongits come with their own set of issues, and are more
>> prone to getting stopped up by legitimate, authenticated force pushes,
>> ref deletions, and so on – and if we set the refspec such that those
>> are allowed through silently, we don’t gain much. "
>>
>> So the only reason they were even using --mirror was because they were
>> running into those problems with fetching.

With a normal fetch. We actually *wanted* things like force updates
and ref deletions to propagate, because we have not just Gitolite's
checks but our own checks on the servers, and wanted that to be
considered the authenticated source. Besides just daily use and
preventing cruft, we wanted to ensure that such actions propagated so
that if a branch was removed because it contained personal
information, accidental commits, or a security issue (for instance)
that the branch was removed on the anongits too, within a timely
fashion.

> I think the --mirror thing is a red herring. It should not be changing
> the transport used, and that is the part of git that is expected to
> catch such corruption.
>
> But I haven't seen exactly what the corruption is, nor exactly what
> commands they used to clone. I've invited the blog author to give more
> details in this thread.

The syncing was performed via a clone with git clone --mirror (and a
git:// URL) and updates with git remote update.

So I should mention that my experiments after the fact were using
local paths, but with --no-hardlinks. If you're saying that the
transport is where corruption is supposed to be caught, then it's
possible that we shouldn't see corruption propagate on an initial
mirror clone across git://, and that something else was responsible
for the trouble we saw with the repositories that got cloned
after-the-fact. But then I'd argue that this is non-obvious. In
particular, when using --no-hardlinks, I wouldn't expect that behavior
to be different with a straight path and with file://.

Something else: apparently one of my statements prompted joeyh to
think about potential issues with backing up live git repos
(http://joeyh.name/blog/entry/difficulties_in_backing_up_live_git_repositories/).
Looking at that post made me realize that, when we were doing our
initial thinking about the system three years ago, we made an
assumption that, in fact, taking a .tar.gz of a repo as it's in the
process of being written to or garbage collected or repacked could be
problematic. This isn't a totally baseless assumption, as I once had a
git repository that I was in the process of updating when I had a
sudden power outage that suffered corruption. (It could totally have
been the filesystem, of course, although it was a journaled file
system.)

So, we decided to use Git's built-in capabilities of consistency
checking to our advantage (with, as it turns out, a flaw in our
implementation). But the question remains: are we wrong about thinking
that rsyncing or tar.gz live repositories in the middle of being
pushed to/gc'd/repacked could result in a bogus backup?

Thanks,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html