[PATCH 00/29] Yet more preparation for reference backends

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Wed, 27 Apr 2016 18:57:17 +0200

This started as a modest attempt to move the last couple of patches
mentioned here [1] to before the vtable patches. I wanted to do that
because having real changes mixed with the vtable refactoring was
making rebasing and stuff awkward.

But then it snowballed. A lot of what's new is pretty trivial (though
in some cases fixes minor bugs). But a few of the later patches are
pretty major.

The two main problems addressed by this series are:

1. Reference transactions will, in the future, sometimes span two
   completely different reference backends. For example, we already
   have worktree-specific refs like `HEAD` and `refs/bisect/*`, which
   are stored within the worktree, plus shared refs, which live in the
   main repository. It is even possible for a symref in one reference
   backend to point at a reference in another reference backend. Thus
   we need to be able to split one main transaction into one
   transaction per backend.

2. Currently, reference transactions that involve symbolic refs are
   not atomic: the symref is not locked at all, even when its reflog
   is being updated. This is a no-no. It also means that its referent
   can change between the time that the symref is resolved to find out
   its old_oid and the time that the referent is locked. These aren't
   super serious races because symrefs are usually not modified very
   often (especially on Git servers, which is mostly where concurrent
   modifications are an issue). But still...

The approach taken to solve these problems is inspired by David
Turner's patch [2], whose approach was first discussed here [3]. David
wrote a separate function, dereference_symrefs(transaction, ...),
which scanned through the whole transaction and split up any symref
updates into one symref update with the REF_LOG_ONLY option (to update
the symrefs reflog), plus a second update that changes the underlying
reference. This approach solves problem 1 but not problem 2.

This patch series takes a slightly different approach. For each
reference update during the "lock references" part of
ref_transaction_commit(), it

1. Locks the named reference. (If it is a symref, lock *the symref*,
   not its referent!)

2. Reads the reference non-recursively

3. If it is a symref, change the update to REF_LOG_ONLY and add a
   second update to the transaction for the underlying reference.

4. If it is not a symref but was derived from a symref update, record
   the reference's old_oid in the ref_update structure for the
   original symref.

In this way, each reference in a symref chain is locked down *before*
we read it and the lock is held throughout the transaction. As a
bonus, we don't have to use resolve_ref_full() to find old_oid for the
references; it is enough to read the references *once*, because we do
it under lock.

The three patches starting with "refs: resolve symbolic refs first"
involve a lot of new code in an area that is pretty intricate. I've
reviewed them a few times, but it's quite possible I've made some
mistakes (especially in the error-handling code). Careful review here
would be much appreciated.

It's possible that people will think this is too much new code for
fixing a relatively unlikely race. I'm not absolutely convinced myself
that it's not overengineered. But splitting ref_updates is definitely
needed for supporting multiple backends, and I think the approach of
locking then following one reference at a time is more or less a
prerequisite for making symref locking work correctly with multiple
reference backends. So (I think) our choices are to accept this code
or something like it, or to give up the hope of correct atomicity of
transactions that span backends.

This patch series is also available from my GitHub repository [4] as
branch "split-under-lock". It applies to Junio's current master.

Incidentally, a draft of the *next* patch series, which adds a
ref_store vtable abstraction for managing reference backends, is
available as branch "ref-store" in my GitHub repo. That branch passes
all tests but is not yet carefully reviewed.

Following is a table of contents for this patch series...

Chapter 1: Prologue

  This chapter consists of small cleanups that I noticed while working
  on the code.

  * safe_create_leading_directories(): improve docstring
  * remove_dir_recursively(): add docstring
  * refname_is_safe(): use skip_prefix()

  This patch fixes a bug in refname_is_safe():

  * refname_is_safe(): don't allow the empty string

  This one makes refname_is_safe() a little stricter...it shouldn't
  allow refnames like "refs/foo/../bar///baz" because the downstream
  code isn't smart enough to handle them anyway. (At the latest if the
  reference has to be sought in packed-refs, things would fail):

  * refname_is_safe(): insist that the refname already be normalized

Chapter 2: Character development

  Make sure error output goes the right place:

  * commit_ref_update(): write error message to *err, not stderr

  Tighten up some code, rename some parameters:

  * rename_ref(): remove unneeded local variable
  * ref_transaction_commit(): remove local variable n
  * read_raw_ref(): rename flags argument to type
  * read_raw_ref(): clear *type at start of function
  * read_raw_ref(): rename symref argument to referent
  * read_raw_ref(): improve docstring
  * lock_ref_sha1_basic(): remove unneeded local variable

Chapter 3: Small adventures

  Change error messages to adhere to our new policy of starting with
  lower-case letters:

  * refs: make error messages more consistent

  Require REF_NODEREF if REF_ISPRUNING is set:

  * ref_transaction_create(): disallow recursive pruning

  Correctly handle an (unlikely) error path:

  * ref_transaction_commit(): correctly report close_ref() failure

  Oops, here's a problem that could have bit us later. (In fact, it
  did bite me when I implemented one of the later patches):

  * delete_branches(): use resolve_refdup()

Chapter 4: Flashback

  The following two patches are split out of one of David Turner's
  patches mentioned above:

  * refs: allow log-only updates
  * refs: don't dereference on rename

Chapter 5: The tension builds

  The protagonist clearly has something planned...but what?

  * verify_refname_available(): adjust constness in declaration
  * add_update(): initialize the whole ref_update
  * lock_ref_for_update(): new function
  * unlock_ref(): move definition higher in the file

  Here we force a little more refname safety while building up
  ref_transactions:

  * ref_transaction_update(): check refname_is_safe() at a minimum

Chapter 6: The climax

  This is the guts of the patch series. Everything up to this point
  should be pretty uncontroversial. These three patches are the main
  subject of the discussion above. They are pretty large patches, but
  I don't see a reasonable way to break them up more:

  * refs: resolve symbolic refs first
  * lock_ref_for_update(): don't re-read non-symbolic references
  * lock_ref_for_update(): don't resolve symrefs

Chapter 7: Denouement

  Remove some stuff that's not needed anymore.

  * commit_ref_update(): remove the flags parameter
  * lock_ref_sha1_basic(): only handle REF_NODEREF mode

Michael

[1] http://article.gmane.org/gmane.comp.version-control.git/289994
[2] http://thread.gmane.org/gmane.comp.version-control.git/287949/focus=287958
[3] http://article.gmane.org/gmane.comp.version-control.git/282927
[4] https://github.com/mhagger/git

David Turner (2):
  refs: allow log-only updates
  refs: don't dereference on rename

Michael Haggerty (27):
  safe_create_leading_directories(): improve docstring
  remove_dir_recursively(): add docstring
  refname_is_safe(): use skip_prefix()
  refname_is_safe(): don't allow the empty string
  refname_is_safe(): insist that the refname already be normalized
  commit_ref_update(): write error message to *err, not stderr
  rename_ref(): remove unneeded local variable
  ref_transaction_commit(): remove local variable n
  read_raw_ref(): rename flags argument to type
  read_raw_ref(): clear *type at start of function
  read_raw_ref(): rename symref argument to referent
  read_raw_ref(): improve docstring
  lock_ref_sha1_basic(): remove unneeded local variable
  refs: make error messages more consistent
  ref_transaction_create(): disallow recursive pruning
  ref_transaction_commit(): correctly report close_ref() failure
  delete_branches(): use resolve_refdup()
  verify_refname_available(): adjust constness in declaration
  add_update(): initialize the whole ref_update
  lock_ref_for_update(): new function
  unlock_ref(): move definition higher in the file
  ref_transaction_update(): check refname_is_safe() at a minimum
  refs: resolve symbolic refs first
  lock_ref_for_update(): don't re-read non-symbolic references
  lock_ref_for_update(): don't resolve symrefs
  commit_ref_update(): remove the flags parameter
  lock_ref_sha1_basic(): only handle REF_NODEREF mode

 builtin/branch.c        |  19 +-
 cache.h                 |   5 +
 dir.h                   |  23 ++
 refs.c                  |  67 ++--
 refs/files-backend.c    | 882 +++++++++++++++++++++++++++++++++++++-----------
 refs/refs-internal.h    |  55 ++-
 t/t1400-update-ref.sh   |  41 ++-
 t/t1430-bad-ref-name.sh |   2 +-
 t/t3200-branch.sh       |   9 +
 9 files changed, 869 insertions(+), 234 deletions(-)

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html