Re: [PATCH v2 38/43] refs: make some files backend functions public

David Turner <dturner@xxxxxxxxxxxxxxxx> · Wed, 07 Oct 2015 16:55:11 -0400

On Wed, 2015-10-07 at 18:00 +0200, Michael Haggerty wrote:
> On 10/07/2015 03:25 AM, David Turner wrote:
> > On Mon, 2015-10-05 at 11:03 +0200, Michael Haggerty wrote:
> >> On 09/29/2015 12:02 AM, David Turner wrote:
> >>> Because HEAD and stash are per-worktree, other backends need to
> >>> go through the files backend to manage these refs and their reflogs.
> >>>
> >>> To enable this, we make some files backend functions public.
> >>
> >> I have a bad feeling about this change.
> >>
> >> Naively I would expect a reference backend that cannot handle its own
> >> (e.g.) stash to instantiate internally a files backend object and to
> >> delegate stash-related calls to that object. That way neither class's
> >> interface has to be changed.
> >>
> >> Here you are adding a separate interface to the files backend. That
> >> seems like a more complicated and less flexible design. But I'm open to
> >> be persuaded otherwise...
> > 
> > After some thought, here's a summary of the problem:
> > 
> > Some writes are cross-backend writes.  For example, if HEAD is symref to
> > refs/head/master, a commit is a cross-backend write (HEAD itself is not
> > updated, but its reflog is).  Ronnie's design of the ref backend
> > structure did not account for cross-backend writes, because we didn't
> > have per-worktree refs at the time (there was only HEAD, and there was
> > only one copy of it).
> > 
> > Cross-backend writes are complicated because there is no way to tell a
> > backend to do only part of a ref update -- for instance, to tell the
> > files backend to update HEAD and HEAD's reflog but not
> > refs/heads/master.  Maybe we could set a flag that would do this, but
> > the synchronization would be fairly complicated.  For instance, an
> > update to HEAD might need to confirm the old sha for HEAD, meaning that
> > we couldn't do the db write first.  But if we move the db write second,
> > then when the db code goes to do its check of the HEAD sha, it might see
> > a new value.  Perhaps there's a way to make it work, but it seems
> > fragile/complex.
> > 
> > Right now, for cross-backend reads/writes, the lmdb code cheats. It
> > simply does the write directly and immediately.  This means that these
> > portions of transactions cannot be rolled back.  That's clearly bad. 
> 
> That's a really good point.
> 
> I hate to break it to you, but the handling of symrefs in Git is already
> a mess. HEAD is the only symref that I would really trust to work
> correctly all the time. So I think that changes needn't be judged on
> whether they handle symrefs perfectly. They should just not break them
> in any dramatic new ways.
> 
> So, you pointed out the problem that HEAD (a per-worktree reference) can
> be a symref that points at a shared reference. In fact, I think when
> HEAD is symbolic it is only allowed to point at a branch under
> refs/heads, so this particular problem is pretty well-constrained.
> 
> Are there other cases of cross-backend writes? I suppose there could be
> a symref elsewhere among the per-worktree references that points at a
> shared reference. But I can't think of any cases where this is done by
> standard Git. Not that it is forbidden; I just don't think it is done by
> any of the standard tools.

Another case would be an update-ref command that updates both
refs/bisect/something and refs/heads/something.  

I don't think git ever does this by default, but anyone can issue a
weird update-ref command if they feel like it.

> Or there could be a symref among the shared references that points at a
> per-worktree reference. But AFAIK the only other symrefs that are in
> common use are the refs/remotes/*/HEAD symrefs, and they always point at
> references within the same (shared) namespace.
> 
> If everything that I've said is correct, then my opinion is that it
> would be perfectly adequate if your code would handle the specific case
> of HEAD (by hook or by crook), and if there are any other cross-backend
> symrefs, just die with a message stating that such usage is unsupported.
> Junio, do you think that would be acceptable?

Hm.  I don't think it's significantly  easier to handle just HEAD than
it would be to handle all cases.  But I'll see what happens as I write
the code.

> > The simplest solution would be for the lmdb code to simply acquire
> > locks, and write to lock files, and then commit those lock files just
> > before the db transaction commits. Then the lmdb code would handle all
> > of the orchestration without the files backend having to be rewritten to
> > handle this case.
> 
> Wouldn't that essentially be re-implementing the files backend? I must
> be missing something.

There would be some amount of reimplementation, yes.  But if we assume
that the number of per-worktree refs is relatively small, we could make
some simplification.  But actually, see below.

> > [...]
> 
> BTW I just realized that if one backend should delegate to another, then
> the primary backend should be the per-worktree backend and it should
> delegate to the common backend. I think I described things the other way
> around in my earlier message. This makes more sense because it is
> acceptable for per-worktree references to refer to common references but
> not vice versa.

I think I might have a good way to deal with this:

If we're going to switch the lmdb transaction code over to accumulate
updates and then do them as one batch, then probably all other
backends will work the same way.  So maybe there is no need for all of
these backend functions:

	ref_transaction_begin_fn *transaction_begin;
	ref_transaction_update_fn *transaction_update;
	ref_transaction_create_fn *transaction_create;
	ref_transaction_delete_fn *transaction_delete;
	ref_transaction_verify_fn *transaction_verify;

Instead, the generic refs code will accumulate updates in a struct
ref_update.  Instead of a lock, the ref_update struct will have a void
pointer that backends can use for per-update data (such as the lock).
The generic code can also handle rejecting duplicate ref updates.

The per-backend transaction_commit method will just take a struct
ref_transaction (that is, what the current patchset calls a
files_ref_transaction) -- basically, a list of ref_updates -- and
attempt to apply it.

While we're doing this, the generic ref code can detect an update to
HEAD, and replace it with an update to whatever HEAD points to (if HEAD
is a symref).  Then it can call files_log_ref_write to write to HEAD's
reflog, if the main transaction commits successfully.  If HEAD is not a
symref, the generic code can just move the HEAD update over to the files
backend.

Does this make sense?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html