Re: [PATCH v8 41/44] refs.c: add a new flag for transaction delete for refs we know are packed only

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Fri, 23 May 2014 23:45:36 +0200

On 05/23/2014 05:53 PM, Jonathan Nieder wrote:
> Hi,
> 
> Michael Haggerty wrote:
> 
>> The status quo is that we have a single reference back end consisting of
>> loose references sitting on top of packed references.
>>
>> But really, loose references and packed references are two relatively
>> independent reference back ends [1].  We just happen to use them layered
>> on top of each other.
>>
>> This suggests to me that our current structure is best modeled as two
>> independent reference back ends, with a third implementation of the same
>> reference API whose job it is to compose the first two.
> [...]
>> [1] Forget for the sake of this discussion that we can't store symbolic
>> references as packed refs.
> 
> I find it hard to forget that. ;-)  More to the point, the trouble
> with loose refs and packed refs as independent reference backends is
> that neither has very good performance characteristics.  Enumerating
> many loose refs is slow.  Adding a new packed ref to a large list is
> also slow.  Git currently uses both loose and packed refs in a way
> that allows each to overcome the limitations of the other, and the
> fact that it involves two on-disk data structures seems to me like an
> implementation detail of how it achieves that.

I'm not advocating that we use loose refs or packed refs alone.  But I
like the code decoupling that this implementation would (I predict) yield.

My main point was that pack-refs is not an integral part of the
reference API but rather a tuning feature very specific to the
loose/packed reference storage scheme.

> So I believe most git code should not have to know about the
> difference between loose and packed refs (or the upper and lower
> layer) to allow the details of the layering can be tuned in low-level
> ref handling code.
> 
> On the other hand, from a code structure perspective I can easily
> believe that implementing some subset (or maybe even all) of the
> reference backend API for loose refs and packed refs separately and
> providing a separate file describing how to compose them might be the
> cleanest way to write this code.  It's more general layering that
> seems to lie in the direction of madness.
> 
> Maybe I'm wrong and people will find lots of use for combinations like
>  * loose refs shadowing an sqlite database
>  * tdb shadowing mysql
>  * etc
> It's easy to prove a naysayer wrong with code and I don't want to
> discourage that.

I admit that I don't have any other layered storage schemes in mind.

> For the topic at hand it's relevant because packed-refs have
> properties that make some operations (certain deletion/ref creation
> combinations) much less fussy than loose refs, and it would be nice to
> be able to take advantage of that.  In the long term I would like to
> see git taking advantage of that when someone tries to fetch refs with
> names that would conflict on the filesystem (e.g., topic, topic/a,
> topic/b).

A transition to allowing D/F-conflicting reference names has two very
distinct aspects to it:

1. Changing to how references (and reflogs!) are stored to make it
technically possible to store such references.

2. Removing restrictions on actually creating such references.

We can take step 1 any time because it is a purely local decision.
Though I think you would need a repository format bump to allow it.
Even though you could work around the D/F problem for references by
packing problematic ones, it is a kludge with a potentially significant
performance cost.  And we have the same problem with reflogs, with no
analogous kludge.  I'd rather make a clean break, for example mapping
reference names into filenames using some kind of escaping of special
characters and suffixes like ".d" and ".f" to prevent directories and
files from conflicting.  Maybe (depending on the OS and/or filesystem?)
escape all non-ASCII characters or even all non-lower-case ASCII
characters to prevent problems with case sensitivity, internal vs.
filesystem character encodings, and NFC vs NFD.

But we won't want to take step 2 until Git clients that are 1-capable
are widespread; otherwise people with different client versions will
have trouble collaborating.  Maybe step 2 should be governed by a
configuration option with three settings:

    FORBIDDEN - don't allow references with D/F conflicts to exist
        in this repository

    NO_CREATE - don't allow the creation of such references locally,
        but accept them from remote sources via commands like "fetch".
        This setting could be used to avoid creating problems for
        collaborators.

    ALLOWED - no restriction on the creation of references with D/F
        conflicts.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html