Re: [PATCH 1/3] retain reflogs for deleted refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 20, 2012 at 11:49:07AM +0200, Michael Haggerty wrote:

> >This patch moves reflog entries into a special "graveyard"
> >namespace, and appends a tilde (~) character, which is
> >not allowed in a valid ref name. This means that the deleted
> >reflogs of these refs:
> >
> >    refs/heads/a
> >    refs/heads/a/b
> >    refs/heads/a/b/c
> >
> >will be stored in:
> >
> >    logs/graveyard/refs/heads/a~
> >    logs/graveyard/refs/heads/a/b~
> >    logs/graveyard/refs/heads/a/b/c~
> >
> >Putting them in the graveyard namespace ensures they will
> >not conflict with live refs, and the tilde prevents D/F
> >conflicts within the graveyard namespace.
> 
> I agree with Junio that long-term, it would be nice to allow
> references "foo" and "foo/bar" to exist simultaneously.  To get
> there, we would have to redesign the mapping between reference names
> and the filenames used for the references and for the reflogs.

Yes, I would really like that, as it could make the alternate namespace
go away, which is the source of about half the code in my patches (i.e.,
we would only need to loosen the reflog reading code to handle reflogs
that do not have a matching ref).

But I fear that the fallouts from that will be much, much larger. Even
with just this change, older versions of git will be slightly unhappy
(e.g., you will get some extra warnings during fsck and reflog
expiration about these reflogs). But changing the on-disk representation
of the refs namespace will mean a totally new representation of locking.
That's going to break old versions of git completely, and possibly even
some user scripts.

> The easiest thing would be to mark files and directories differently;
> something like
> 
>     $GIT_DIR/{,logs/}refs/heads/a/b/c~
> [...]
> The first convention, "logs/refs/heads/a/b/c~" is not usable because
> a reflog for a dead reference with this name would conflict with a
> reflog for a live reference "heads/a" or "heads/a/b" that uses the
> current filename convention.

Right. That's what I started with, then created the graveyard hierarchy
to avoid conflicts between the "old" namespace (that cannot handle D/F
conflicts) and the "new" one (that can, because it represents files and
directories differently).

> or
> 
>     $GIT_DIR/{,logs/}refs/heads~/a~/b~/c
> 
> i.e., munging either directory or file names to strings that are
> illegal in refnames such that it is unambiguous from the name whether
> a path is a file or directory.

This one can have conflicts in the opposite direction if you don't have
any directories. E.g., you have $GIT_DIR/foo, a deleted ref, which has
no tildes because it has no directories in the path. But you want to
create foo/bar under the "old" system, which cannot happen (under the
new system, it is fine, but the point of this exercise is to overlay the
old and new systems).

That may be an OK tradeoff. We are restrictive in what goes into the
top-level. Although I notice that you did not mark "refs" in the above
example. So you could have the same problem with "refs/stash", for
example. Again, though, we don't tend to have arbitrary data at the
top-level (and I think refs/stash gets special cased in a couple places
already). So it might be an acceptable limitation.

If we want to be pedantic, my patch causes conflicts for top-level refs
called "graveyard" (although I know we have talked about restricting
top-level refs to [A-Z_-], I don't recall if that has actually
happened).

> And *if* we did that, then we wouldn't need a separate "graveyard"
> namespace, would we?  The reflogs for dead references could live
> among those for living references.

Right, assuming the limitation above is OK. But note that it doesn't
really save us any code. We still have to convert between refnames and
graveyard versions. _Eventually_ if the refnames were all converted,
that code could go away.

> But the second convention, "logs/refs/heads~/a~/b~/c, cannot conflict
> with current reflog files.  And it would be a step towards allowing
> "foo" and "foo/bar" at the same time.  What do you think about using
> a convention like this instead of the one that you proposed?

I think it's reasonable. As I said, it doesn't save any code _now_, but since
I am pulling a convention out of thin air, it might as well be one that
has a possibility of converging in the future (all other things being
equal, of course; I do find marking the directories a little uglier to
read, but that is mostly because of the tilde).

> Another minor concern is the choice of trailing tilde in the file or
> directory names.  Given that emacs creates backup files by appending
> a tilde to the filename, (1) it would be easy to inadvertently create
> such files, which git might try to interpret as reflogs and (2) there
> might be tools that innately "know" to skip such files in their
> processing. ack-grep, a replacement for grep, is an example that
> springs to mind.

The use of "~" for backup files was actually something that made me
choose it, since these are, after all, backups of the reflog. But they
are probably more precious than editor backup files, so the special
treatment they're given by other programs is probably not desirable.

> Other possibilities (according to git-check-ref-format(1)):
> 
>     refs/.heads/.a/.b/c
>     refs/heads./a./b./c (problematic on some Windows filesystems?)
>     refs/heads../a../b../c
>     refs/heads~dir/a~dir/b~dir/c (or some other suffix)
>     refs/heads..a..b..c (not recommended because it flattens
> directory hierarchy)

I don't like leading-dot, because those files are also often skipped by
directory traversal of some programs (and certainly they are confusing
to work with if you try to use "ls" to debug your $GIT_DIR/logs
directory). Trailing dot is less ugly to me, but I do wonder about its
special meaning as an extension separator. Double-dots just look gross.

Note that we have a few other magic characters available, too. Colon is
probably the least offensive (metacharacters like *, ?, and [ just make
things unnecessarily painful for shell users).

So I think a suffix like ":d" is probably the least horrible.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]