Re: [PATCH] create_ref_entry(): move check_refname_format() call to callers

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Mon, 30 Apr 2012 18:18:53 +0200

On 04/29/2012 01:58 PM, Jeff King wrote:
On Sun, Apr 29, 2012 at 08:18:08AM +0200, mhagger@xxxxxxxxxxxx wrote:

I will work on providing more infrastructure for checking refnames at
varying levels of strictness, but I don't know enough about the code
paths to be able to find the places where the strictness levels need
tweaking.

For this to work, the various callers of check_refname_format() will
have to be able to influence the level of strictness that they want to
enforce.  This patch is one trivial step in that direction.

It seems like the create_ref_entry code paths should _always_ just be
issuing warnings, as they are about reading existing refs, no? The die()
side should only come when we are writing refs.

There are not two but three classes of refnames:

1. Fully legal refnames (according to the rules of check-ref-format).

2. Refnames that might cause parsing trouble in some circumstances but 
could otherwise be tolerated (with a warning) in the hope that the user 
can delete them before they cause further confusion.  These include all 
illegal refnames that aren't covered by (3).

3. Refnames that are so pathological that we don't want to let them into 
our code at all, under any circumstances.  This category, I think, 
includes refnames that include "/../" (because they could cause our code 
to walk up the filesystem) or LF (because if written to a packed-refs 
file they would make it unreadable) and perhaps "//" (because they would 
confuse the loose reference code and the hierarchical reference cache 
code).  And depending on how much you trust shell scripts to do quoting 
correctly, other patterns might also be risky.

I would like to be able to distinguish between (2) and (3) before 
deciding what to do in any specific cases.  References in category (2) 
can probably be accepted (with a warning) in old data but we should not 
allow the user to create new ones.  References in category (3) are more 
critical; I see three options for dealing with them: die(), emit a 
warning then drop them on the floor, or emit a warning but handle them 
somehow (for example, by URL-encoding them).

Treating category (3) the same as category (2) was more or less the 
status quo before the changes, but I think it is dangerous, especially 
when dealing with references that might come from remote sources (do you 
disagree?).

Regarding create_ref_entry(), there are three callers:

* read_packed_refs(): Only used to read references from local
  packed-refs files.  Seems uncritical in terms of security, so for now
  I suppose we could change this caller to emit a warning but use the
  reference anyway.  (It would not be a good idea to emit a warning but
  *not* use the reference, because the next time the packed-refs file
  is rewritten the reference would be lost.)

* get_ref_dir(): Only used to read loose references from the local
  filesystem.  Seems uncritical in terms of security, so for now
  I suppose we could change this caller to emit a warning but use the
  reference anyway.  Alternately, we could emit a warning but not use
  the reference; this would not result in any data loss because nobody
  would have a reason to remove the loose reference file.

* add_packed_ref(): Used by write_remote_refs() to insert references
  from a cloned repository into the local packed-refs file.  Here I
  think we have to be paranoid about accepting refnames in category
  (3).  For example, it might be reasonable to emit a warning
  mentioning the illegal reference name *and the SHA1 that it referred
  to* then to drop it on the floor.

Michael

--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html