Re: [PATCH 5/5] ref_transaction_commit(): only keep one lockfile open at a time

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Sat, 25 Apr 2015 08:08:28 +0200

On 04/24/2015 01:35 PM, Michael Haggerty wrote:
> The old code locked all of the references in the first loop, leaving
> all of the lockfiles open until later loops. Since we might be
> updating a lot of references, this could cause file descriptor
> exhaustion.
> 
> But there is no reason to keep the lockfiles open after the first
> loop:
> 
> * For references being deleted or verified, we don't have to write
>   anything into the lockfile, so we can close it immediately.
> 
> * For references being updated, we can write the new SHA-1 into the
>   lockfile immediately and close the lockfile without renaming it into
>   place. In the second loop, the already-written contents can be
>   renamed into place using commit_ref_update().
> 
> To simplify the bookkeeping across loops, add a new REF_NEEDS_COMMIT
> bit to update->flags, which keeps track of whether the corresponding
> lockfile needs to be committed, as opposed to just unlocked.
> 
> This change fixes two tests in t1400.

I realize that I should have highlighted another, perhaps more serious
bug that is fixed by this change. The old code was roughly

    for update in updates:
        acquire locks and check old_sha
    for update in updates:
        if changing value:
            write_ref_to_lockfile()
            commit_ref_update()
    for update in updates:
        if deleting value:
            unlink()
    for update in updates:
        if reference still locked:
            unlock_ref()

The atomicity of the update depends on all preconditions being checked
in the first loop, before any changes have start being committed. The
problem is that write_ref_to_lockfile() (which used to be part of
write_ref_sha1()), which happens in the second loop, checks some more
preconditions:

* It verifies that new_sha1 is a valid object
* If the reference being updated is a branch, it verifies that new_sha1
  points at a commit object (as opposed to a tag, tree, or blob).

If either of these preconditions fails, the "transaction" might be
aborted after some reference updates have already been permanently
committed. In other words, the all-or-nothing promise of "git update-ref
--stdin" and "git push --atomic" would be violated.

I'm sorry that I didn't fully realize this problem earlier.

After this patch, the code looks like

    for update in updates:
        acquire locks and check old_sha
        if changing value:
            write_ref_to_lockfile()
    for update in updates:
        if changing value:
            commit_ref_update()
    for update in updates:
        if deleting value:
            unlink()
    for update in updates:
        if reference still locked:
            unlock_ref()

This has the important effect that the pre-checks in
write_ref_to_lockfile() are carried out before any changes have been
committed, so if any of those checks fail, the transaction can be rolled
back correctly.

Given that "git push --atomic" is one of the main new features of 2.4.0,
it would be unfortunate for the release to contain this bug, plus the
bug that atomic pushes can fail due to file descriptor exhaustion.

Is this problem important enough to justify delaying the release to get
the fix in?

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html