Re: [PATCH v8 41/44] refs.c: add a new flag for transaction delete for refs we know are packed only

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Thu, 29 May 2014 00:23:53 +0200

On 05/28/2014 06:58 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
> 
>> I think for any two backends that are stacked, you would need to break
>> down a transaction as follows (here generalized to include not only
>> deletions):
>>
>>     packed->begin_transaction()
>>     loose->begin_transaction()
>>
>>     # And this part is done within stacked->commit_transaction():
>>     for entry in many_ref_updates():
>>         if have_old:
>>             stacked->verify_reference(ref, old_sha1)
>>
>>         if entry is a delete:
>>             packed->delete_reference(entry)
>>         loose->update_reference(entry)
>>
>>     if (!packed->commit_transaction())
>>         loose->commit_transaction()
> 
> Ugggly.

The ugliness would be encapsulated in the stacked reference backend.  It
wouldn't have to appear in client code.

> In general you would need to worry about the case where the
> first commit succeeds and then the second commit fails, wouldn't
> you?
> 
> The above happens not to break horribly wrong for the "Loose on top
> of Packed" layering, in that the refs you wanted to delete are only
> gone from Packed but still are in Loose, and the end result would be
> some of them are really gone (because they weren't in Loose) and
> some others remain (because they were in Loose), and at least you
> did not lose any ref you did not want to discard.  But it still is
> not what should happen in a proper "transaction".

True, but we don't have proper transactions anyway.  If anything goes
wrong when renaming one of the loose reference lockfiles, we have no way
of reliably rolling back the loose references that have already been
changed.  The word "transaction" as we use it really only suggests that
we are doing our best to change the references all-or-nothing even in
the face of concurrent modifications by other processes.  It certainly
doesn't protect against power failures and stuff like that.

I suppose if you wanted to make reference deletions a little bit more
transaction-like, you could split them into two steps to "loosen" any
packed references and a deletion step that deletes the loose references.
 The advantage of this scheme is that each step requires a transaction
on only a single back end at a time, though it requires part or all of
the other back end to be locked simultaneously.

    # The first two iterations "loosen" any packed references
    # corresponding to references that are to be deleted.  It is purely
    # an internal reorganization that doesn't change the externally-
    # visible values of any references and can be done within a
    # separate transaction:
    for each reference to delete:
        if reference exists packed but not loose:
            create a loose ref with the value from the packed ref
    for each reference to delete:
        if reference exists packed:
            delete packed version of reference

    # Now we only have to delete the loose version of the references.
    # This should be done after activating the packed reference file
    # but while continuing to hold the packed-refs lock:
    for each reference to delete:
        delete loose version of reference

I'd have to think more about whether this all really works generically
without requiring lots of extra round-trips to a possibly-remote
backend.  But I'm not sure we really need this composability in its full
generality.  For example, maybe it is impossible to stack a low-latency
and a high-latency backend together while minimizing round-trips to the
latter.

>>> But the above would not quite work, as somebody needs to remove logs
>>> for refs that were only in the Packed backend, and "repack without
>>> these refs" API supported by the Packed backend cannot be that
>>> somebody---"repack packed-refs without A B C" cannot unconditionally
>>> remove logs for A B C without checking if A B C exists as Loose.
>>
>> Correct.  That's another reason that logging has to be the
>> responsibility of something at the "stacked" level of abstraction or higher.
>>
>> I think the logging should be done by yet another outer layer of
>> wrapper that only does the logging, while also passing all updates
>> down 1:1 to the backend that it wraps (which in our case would be
>> a stacked backend). ... Then the loose and packed backends could
>> remain completely ignorant of the fact that reference updates can
>> be logged.
> 
> That would mean that Loose (or Packed) class cannot be used as-is
> and always needs to be wrapped with the layer that does the logging,
> no?

Correct.

> In a system with "only packed-refs, no loose", you would want
> Packed.deleteRefs() to remove the named refs ref and their reflogs,
> but that would mean that the Layered wrapper that uses Loose
> overlayed on Packed cannot call that method, because it does not
> want reflogs of the refs in Packed covered by the ones in Loose.

The two scenarios would be roughly

    refhandler = new LoggingWrapper(new PackedRefs())

and

    refhandler = new LoggingWrapper(
        new StackedRefs(new LooseRefs(), new PackedRefs())
        )

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html