On 09/12/2011 06:44 PM, Junio C Hamano wrote: > Michael Haggerty <mhagger@xxxxxxxxxxxx> writes: >> OTOH I am again having serious doubts that trying to support >> unnormalized refnames is a good idea. > > Sorry, I am confused. I thought the way you are planning forward is to > leave unnormalized ones unchecked as the current code does (and mark the > places that do so with _unsafe()), with the eventual goal of fixing the > caller to pass only normalized ones to make call to the "safe" version? That was the plan, but after my experience trying to fix this problem I have come to doubt that it is doable within a reasonable amount of work or even that support for unnormalized refnames is desirable. The problem is that the API provided by refs.{c,h} is far from waterproof, and I keep finding more code elsewhere that manipulates and parses refnames and makes implicit assumptions (sometimes spread over many functions) about their form. Consistency of the UI should be the goal. Supporting unnormalized refnames some places, but not others, will just confuse and frustrate users. The only two consistent alternatives are 1. Unnormalized refnames are supported everywhere in the UI that refnames are accepted, including clients like gitweb, gitk, egit, etc. 2. Only normalized refnames are supported; unnormalized refnames are errors that we report on a best-effort basis. Option (1) has a number of problems: * Claiming to support unnormalized refnames is far from the current situation; therefore lots of current code would have to be considered broken. * Fixing the code requires many new unit tests and fixes to many areas of the code, including clients outside of the main git project. I have tried fixing a couple of examples ("git branch", "git rev-parse", and the first argument of "git update-ref") and it is pretty messy. * Some of the needed changes seem like they might conflict with other forms of DWIM; for example, the ambiguous_path() kludge. * The extra copying needed for normalization has a runtime cost and complicates memory management. * Unnormalized refnames are *themselves* a form of UI inconsistency and therefore not a very noble goal. It is better that people learn that each reference has a single name, and unlearn that references were once 1:1 with files under .git/refs. What is the benefit of (2) that justifies all of this work? To enable sloppy script writers to throw slashes around carelessly? Option (1) would be far easier. Then code only needs to treat unnormalized refnames like any other kind of invalid refname rather than making sure that unnormalized refnames work properly in combination with all other features. So I propose the following: * Institute a policy that refnames in the UI must always be normalized * On a best-effort basis, emit errors whenever unnormalized refnames are encountered * Continue to support "git check-ref-format --print", which script writers can use to normalize refnames if they need to. The only disadvantage of a stricter policy is that some of today's sloppy scripts, which today might sometimes work (but not reliably), break in a pretty obvious way that can be fixed with a single call to "git check-ref-format --print". I'd rather get beyond this swamp and start working on the hierarchical reference cache, which will bring some real benefits. (The hierarchical reference cache requires some sanity in refname handling, which is why I got into this mess in the first place.) What do people think? Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html