Re: [PATCH 19/24] refs.c: allow listing and deleting badly named refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 2, 2014 at 11:55 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Jonathan Nieder <jrnieder@xxxxxxxxx> writes:
>
>> From: Ronnie Sahlberg <sahlberg@xxxxxxxxxx>
>> ...
>> In resolving functions, refuse to resolve refs that don't pass the
>> check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME
>> flag is passed.  Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to
>> resolve refs that escape the refs/ directory and do not match the
>> pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD").
>>
>> In locking functions, refuse to act on badly named refs unless they
>> are being deleted and either are in the refs/ directory or match [A-Z_]*.
>>
>> Just like other invalid refs, flag resolved, badly named refs with the
>> REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them
>> in all iteration functions except for for_each_rawref.
>>
>> Flag badly named refs with a REF_BAD_NAME flag to make it easier for
>> future callers to notice and handle them specially.
>>
>> In the transaction API, refuse to create or update badly named refs,
>> but allow deleting them (unless they escape refs/ and don't match
>> [A-Z_]*).
>>
>> Signed-off-by: Ronnie Sahlberg <sahlberg@xxxxxxxxxx>
>> Signed-off-by: Jonathan Nieder <jrnieder@xxxxxxxxx>
>
> Thanks.  We originally threw all the different kind of breakages
> into ISBROKEN, but a ref can have a malformed name or can contain a
> bad/non value and allowing us to tell them apart is a good direction
> to go.
>
>> diff --git a/cache.h b/cache.h
>> index 5ca7f2b..0c0ac60 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -978,16 +978,26 @@ extern int read_ref(const char *refname, unsigned char *sha1);
>>   * If flags is non-NULL, set the value that it points to the
>>   * combination of REF_ISPACKED (if the reference was found among the
>>   * packed references), REF_ISSYMREF (if the initial reference was a
>> - * symbolic reference) and REF_ISBROKEN (if the ref is malformed).
>> + * symbolic reference), REF_BAD_NAME (if the reference name is ill
>> + * formed --- see RESOLVE_REF_ALLOW_BAD_NAME below), and REF_ISBROKEN
>> + * (if the ref is malformed).
>
> You want to define "is malformed" here.
>
> The original defines REF_ISBROKEN as "malformed" because
>
>  (1) resolve_ref_unsafe() uses get_sha1_hex() and read_loose_refs()
>      uses read_ref_full(), both to catch "malformed values" stored;
>
>  (2) resolve_ref_unsafe() uses check_refname_format() and catches
>      "malformed names" stored as a symref target.
>
> I _think_ you are introducing ALLOW_BAD_NAME to allow add the third
> class ".git/refs/remotes/origin/mal..formed..name".  I do not know
> if they should be the same class as a symref with a good name
> ".git/refs/remotes/origin/HEAD" that points at a bad name
> "mal..formed..name", which is (2) above).  Perhaps not.  (2) is
> still above what is stored in the ref, and the ref in question may
> or may not have a well-formed name, which is orthogonal.
>
> So probably you left only the "value stored in the ref is malformed"
> (in other words, "we expected to find 40-hex object name but didn't
> find one") case for REF_ISBROKEN?

I updated cache.h to try to clarify it better.
The intention here is to expand the use of REF_ISBROKEN.

For all cases  REF_ISBROKEN will be set. This includes both "the sha1
value is bad" as well as "a name is bad".

For those cases where a name is bad then REF_BAD_NAME is also set. Bad
names are when either the ref itself has a bad name or when a bad name
is encountered while resolving a chain of symbilic refs.


I.e.  REF_BAD_NAME is a special case of broken ref. All REF_BAD_NAME
refs are also REF_ISBROKEN but not the reverse.


>
> Do we want to separate "value is not 40-hex" and "a symref points at
> a malformed refname" as separate "malformed value" errors?
>
>> + * RESOLVE_REF_ALLOW_BAD_NAME allows resolving refs even when their
>> + * name is invalid according to git-check-ref-format(1).  If the name
>> + * is bad then the value stored in sha1 will be null_sha1 and the
>> + * REF_ISBROKEN and REF_BAD_NAME flags will be set.
>
> Viewed with that light, I am not sure if a badly named ref should
> yield null_sha1[] (REF_ISBROKEN, which I am assuming is about a
> value that is badly formatted and cannot be read, should keep
> yielding it as before).  Wouldn't it make it harder for the user if
> you give null_sha1[] back to somebody who is trying to recover by
> reading "refs/heads/mal..formed", creating "refs/heads/sensible" to
> point at the same value and then removing the former?
>
> Note that I am not saying we should give back the parsed value at
> this step in the series.  Perhaps there are some existing callers
> that do not check for ISBROKEN flag and instead says "null_sha1[]
> ref is to be rejected/ignored", in which case they may need to be
> corrected before that happens.  Or there may be some reason I
> overlooked that makes it not so useful if we returned the object
> name stored in a ref whose name is malformed.  Just wondering.

The reason for these malformed refs resolving to null_sha1 is exactly
that. There may be callers that do not check ISBROKEN, so all those
callers need to be carefully audited before we start returning
potentially valid non null_sha1 here.
Right now I only want to do the minimal changes needed to open up only
those few paths I need to in order to allow the refs to be deleted.
For the delete case, returning the null_sha1 is sufficient.


Now, IF we add support to "rename a bad-ref-name to a good-ref-name",
then for that case we would need to start allowing resolve_ref_unsafe
to actually return the sha1 for the ref.
And that would require that we do those changes and audits. Given how
rare this should be, I am not convinced we need "rename bad to good"
at this point and would thus prefer to hold off adding that.

Thus, returning the object name is not so useful for the problem I try
to solve at this stage, namely "make delete work".


>
>> @@ -272,6 +272,37 @@ static struct ref_dir *get_ref_dir(struct ref_entry *entry)
>>       return dir;
>>  }
>>
>> +static int escapes_cwd(const char *path) {
>> +     char *buf;
>> +     int result;
>> +
>> +     if (is_absolute_path(path))
>> +             return 1;
>> +     buf = xmalloc(strlen(path) + 1);
>> +     result = !!normalize_path_copy(buf, path);
>> +     free(buf);
>> +     return result;
>> +}
>
> I think this function is misnamed for two reasons.
>
>  - It does not have anything to do with cwd; it does not make any
>    difference to the outcome of this function given the same input,
>    if 'pwd' says "/u/jc/git" or "/u/jc/git/Documentation", no?
>
>  - Even if this had something to do with cwd, I would expect a
>    function whose name is escapes_cwd("/u/jc/git/Documentation") to
>    yield false when 'pwd' says "/u/jc/git", but the implementation
>    unconditionally rejects absolute path.  In the context of the
>    (sole) caller of this function, which deals with a refname "refs/...",
>    it makes no sense to see an absolute path, but that does not have
>    anything to do with "does this path escape cwd?", no?
>

Agree.
I removed this function completely and just inline the !normalize_path_copy()
checks to make sure that refs/* paths remain within refs/.


>> +/*
>> + * Check if a refname is safe.
>> + * For refs that start with "refs/" we consider it safe as long as the rest
>> + * of the path components does not allow it to escape from this directory.
>> + * For all other refs we only consider them safe iff they only contain
>> + * upper case characters and '_'.
>> + */
>
> I presume that the exception is to accomodate for "HEAD", "ORIG_HEAD",
> "MERGE_HEAD" and friends, but you probably do not want the readers to
> guess.
>

I updated the commend to highlight that this is for "HEAD" and friends.

>> +static int refname_is_safe(const char *refname)
>> +{
>> +     if (starts_with(refname, "refs/"))
>> +             return !escapes_cwd(refname + strlen("refs/"));
>> +     while (*refname) {
>> +             if (!isupper(*refname) && *refname != '_')
>> +                     return 0;
>> +             refname++;
>> +     }
>> +     return 1;
>> +}
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]