Re: [PATCH] create_ref_entry(): move check_refname_format() call to callers

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 30 Apr 2012 10:14:41 -0700

Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:

> There are not two but three classes of refnames:
>
> 1. Fully legal refnames (according to the rules of check-ref-format).
>
> 2. Refnames that might cause parsing trouble in some circumstances but
> could otherwise be tolerated (with a warning) in the hope that the
> user can delete them before they cause further confusion.  These
> include all illegal refnames that aren't covered by (3).
>
> 3. Refnames that are so pathological that we don't want to let them
> into our code at all, under any circumstances.  This category, I
> think, includes refnames that include "/../" (because they could cause
> our code to walk up the filesystem) or LF (because if written to a
> packed-refs file they would make it unreadable) and perhaps "//"
> (because they would confuse the loose reference code and the
> hierarchical reference cache code).  And depending on how much you
> trust shell scripts to do quoting correctly, other patterns might also
> be risky.

I can buy that we would want to reject /../ or // from the command line
even when we are not creating such a ref (e.g. "git symbolic-ref HEAD
refs/heads/../heads/foo"). Note that even though we won't read any loose
ref of that form from the repository ever (unless your recent ref API
updates broke it ;-), we _could_ see them in .git/packed-refs if the user
manually edited the file. In such a case, I would say that recovering from
such a corruption is totally up to the user, just like it is up to you
after you edit the bits on the disk platter to confuse the filesystem
code.

But if we were to do so, class (3) needs to be tightly controlled. For
example, a ref "refs/heads/m${LF}aster", e.g.

    $ git update-ref 'refs/heads/m
    aster' HEAD

may have been usable with an ancient git, and as long as the you never
packed your refs, you should be able to see it with 'git branch', and you
should be able to fix it with 'git branch master "m${LF}aster"'.

> I would like to be able to distinguish between (2) and (3) before
> deciding what to do in any specific cases.  References in category (2)
> can probably be accepted (with a warning) in old data but we should
> not allow the user to create new ones.

There is no probably about it.  We should warn and accept---anything else
is a regression---and refuse to create.

> References in category (3) are
> more critical; I see three options for dealing with them: die(), emit
> a warning then drop them on the floor, or emit a warning but handle
> them somehow (for example, by URL-encoding them).

As long as class (3) is tightly controlled, die() until the corruption is
sorted out (if it is coming to us by reading from existing data) would be
the best option to avoid spreading the breakage further.

> Treating category (3) the same as category (2) was more or less the
> status quo before the changes, but I think it is dangerous, especially
> when dealing with references that might come from remote sources (do
> you disagree?).

I actually do. You say "parsing trouble" in (2), but what are the examples
of "parsing trouble" you have in mind?  "refs/heads/a..b" is an illegal
name per check-ref-format and is (and should be) in class (2), but as long
as we do not create such a ref, if we happen to be able to read it (maybe
"git rev-parse HEAD >.git/refs/heads/a..b" dropped it there), the user
should be able to see it with "git branch", and should be able to recover
with "git branch -m a..b a-dot-dot-b". Until the user does so, we cannot
sensibly respond to "git log a..b" (which is the "parsing trouble" you are
referring to, I think), of course, but that is to be expected.

What kind of "danger" do you have in mind?

> Regarding create_ref_entry(), there are three callers:
>
> * read_packed_refs(): Only used to read references from local
>   packed-refs files.  Seems uncritical in terms of security, so for now
>   I suppose we could change this caller to emit a warning but use the
>   reference anyway.

There is no _could_.  This must show it with a warning and let the caller
use its value.

> * get_ref_dir(): Only used to read loose references from the local
>   filesystem.

Likewise.

> * add_packed_ref(): Used by write_remote_refs() to insert references
>   from a cloned repository into the local packed-refs file.

The caller should be checking the names and rejecting the attempt to
create a ref with a bogus name.

I think create_ref_entry(), if the reader does not pay much attention to
the "_entry" part of its name, gives a false impression that this is about
"creating a ref", but it isn't.  It is actually about keeping an in-core
image of the outside world, and the one that affects the outside world
(either loose or packed refs) should be making sure the names are safe
ones to use.

Thanks for a good summary.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html