Re: [PATCH v2 0/7] Improved infrastructure for refname normalization

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Tue, 13 Sep 2011 06:16:13 +0200

On 09/12/2011 06:44 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
>> OTOH I am again having serious doubts that trying to support
>> unnormalized refnames is a good idea.
> 
> Sorry, I am confused. I thought the way you are planning forward is to
> leave unnormalized ones unchecked as the current code does (and mark the
> places that do so with _unsafe()), with the eventual goal of fixing the
> caller to pass only normalized ones to make call to the "safe" version?

That was the plan, but after my experience trying to fix this problem I
have come to doubt that it is doable within a reasonable amount of work
or even that support for unnormalized refnames is desirable.  The
problem is that the API provided by refs.{c,h} is far from waterproof,
and I keep finding more code elsewhere that manipulates and parses
refnames and makes implicit assumptions (sometimes spread over many
functions) about their form.

Consistency of the UI should be the goal.  Supporting unnormalized
refnames some places, but not others, will just confuse and frustrate
users.  The only two consistent alternatives are

1. Unnormalized refnames are supported everywhere in the UI that
refnames are accepted, including clients like gitweb, gitk, egit, etc.

2. Only normalized refnames are supported; unnormalized refnames are
errors that we report on a best-effort basis.

Option (1) has a number of problems:

* Claiming to support unnormalized refnames is far from the current
situation; therefore lots of current code would have to be considered
broken.

* Fixing the code requires many new unit tests and fixes to many areas
of the code, including clients outside of the main git project.  I have
tried fixing a couple of examples ("git branch", "git rev-parse", and
the first argument of "git update-ref") and it is pretty messy.

* Some of the needed changes seem like they might conflict with other
forms of DWIM; for example, the ambiguous_path() kludge.

* The extra copying needed for normalization has a runtime cost and
complicates memory management.

* Unnormalized refnames are *themselves* a form of UI inconsistency and
therefore not a very noble goal.  It is better that people learn that
each reference has a single name, and unlearn that references were once
1:1 with files under .git/refs.

What is the benefit of (2) that justifies all of this work?  To enable
sloppy script writers to throw slashes around carelessly?

Option (1) would be far easier.  Then code only needs to treat
unnormalized refnames like any other kind of invalid refname rather than
making sure that unnormalized refnames work properly in combination with
all other features.

So I propose the following:

* Institute a policy that refnames in the UI must always be normalized

* On a best-effort basis, emit errors whenever unnormalized refnames are
encountered

* Continue to support "git check-ref-format --print", which script
writers can use to normalize refnames if they need to.

The only disadvantage of a stricter policy is that some of today's
sloppy scripts, which today might sometimes work (but not reliably),
break in a pretty obvious way that can be fixed with a single call to
"git check-ref-format --print".

I'd rather get beyond this swamp and start working on the hierarchical
reference cache, which will bring some real benefits.  (The hierarchical
reference cache requires some sanity in refname handling, which is why I
got into this mess in the first place.)

What do people think?
Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html