Re: [PATCH v3] fast-import: checkpoint: only write out refs and tags if we changed them

Elijah Newren <newren@xxxxxxxxx> · Wed, 15 May 2019 17:35:52 -0700

On Tue, May 14, 2019 at 10:30 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Eric Rannaud" <e@xxxxxxxxxxxxxxxx> writes:
>
> > We now keep track of whether branches and tags have been changed by this
> > fast-import process since our last checkpoint (or startup). At the next
> > checkpoint, only refs and tags that new commands have changed are
> > written to disk.
>
> And when we notice that we have been competing with some other
> process and the ref that we wanted to update has already been
> updated to a different value, what happens?  That should be part of
> the description of the new behaviour (aka "fix") written here.
>
> > ---
>
> Missing sign-off.
>
> It sounds like a worthwhile thing to do.
>
> It seems that Elijah has been active in fast-import/export area in
> the last 12 months, so let's borrow a second set of eyes from him.

I glanced over the fast-import.c portion of the code and it seemed
sane, modulo this question about handling of failure to update
branches/tags during a checkpoint.  Didn't read the testcase portion
of the patch as closely yet, but I think the big question is failures
during checkpoints.

First, I have to admit to having never used checkpoints, and I'm
struggling a little bit to understand the usecase here.

For context on my angle, I've been always using --force and 'feature
done', which ensures that either all branches and tags successfully
update, or fast-import terminates with having made no changes (other
than perhaps having written a garbage packfile that can be pruned).
I'm particularly worried about the frontend hitting errors and not
completing its output.  With that in mind, I was surprised to notice
that explicit checkpoints update branches and tags; it seemed
incompatible with --done to me, though it turns out to only be
incompatible with my intended use (no partial updates).

In your case, you are clearly interested in partial updates for some
purpose.  It's easiest to figure out the 'right' behavior for partial
updates in conjunction with --force: as you state, if someone else
updates some ref after a checkpoint and the input stream continues to
change the ref, then tough luck for the external party; we were told
to `--force` so we overwrite.  But the case without --force is most
interesting, as Junio highlighted.  I don't think that case has been
thought about deeply in the past, for two reasons: (1) the fast-import
documentation is somewhat dismissive of checkpoints ("given that a 30
GiB Subversion repository can be loaded into Git through fast-import
in about 3 hours, explicit checkpointing may not be necessary") , (2)
I think most folks assume the repository isn't going to be interacted
with during the fast-import process.  However, the whole reason for
your patch is because checkpoints are in use *and* people are updating
the repo during the fast-import process, so this area that was likely
overlooked in the past now is pretty important to get right.

Looking at the fast-import code, it appears that update_branch and
update_tag both set a global 'failure' flag whenever a branch cannot
be updated due to it not being a fast-forward.  It prints a "warning"
but this is likely because it wants to update whatever branches and
tags it can and report issues on any it can't, and then after that's
done then exit.  Since branches and tags are typically just updated at
the very end of the process (again, based on the documentation
suggesting checkpoints aren't necessary), the end of program exit can
also be thought of as the "exit immediately after failing branch/tag
updates".  My current suspicion is that the current checkpointing code
not checking the 'failure' flag was an oversight.  If the user doesn't
want branches/tags forcibly updated, and they do want checkpoints when
the branches/tags get updated as input comes in, and there is the
possibility that external folks modify thing, then I think it makes
sense to abort the import once the checkpoint is finished (i.e. update
whatever branches/tags we can and then error out); continuing to
process the input stream seems wrong, because:
  * the reason to use checkpoints is because the fast-import process
is taking a long time; if it was quick, there'd be no reason to
manually checkpoint.  Therefore...
  * the user may be forced to backup and redo the long-lived
fast-import process anyway; we've cost them time by not erroring out
immediately
  * we allow errors to accumulate (other branches could have failures
to update later in the fast-import process), causing users to have to
figure out whether one or many failures were reported and then have to
try to track down if they have one or multiple causes
  * the users are less likely to notice early "warning" without
erroring out, but the cause (someone else updating a ref) may be
harder to diagnose the longer it has been since the external ref
update.

That's my impression looking over your change and reading over the
relevant bits of fast-import.c, but as I noted before I don't
understand the usecase and perhaps there's something important I'm not
grasping based on that.  It'd be great if you could provide more
details to help us determine if we need to update the checkpoint code
to appropriately handle errors.

Hope that helps,
Elijah