On Fri, Oct 3, 2014 at 1:25 PM, Ronnie Sahlberg <sahlberg@xxxxxxxxxx> wrote: > On Thu, Oct 2, 2014 at 11:55 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> Jonathan Nieder <jrnieder@xxxxxxxxx> writes: >> >>> From: Ronnie Sahlberg <sahlberg@xxxxxxxxxx> >>> ... >>> In resolving functions, refuse to resolve refs that don't pass the >>> check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME >>> flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to >>> resolve refs that escape the refs/ directory and do not match the >>> pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). >>> >>> In locking functions, refuse to act on badly named refs unless they >>> are being deleted and either are in the refs/ directory or match [A-Z_]*. >>> >>> Just like other invalid refs, flag resolved, badly named refs with the >>> REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them >>> in all iteration functions except for for_each_rawref. >>> >>> Flag badly named refs with a REF_BAD_NAME flag to make it easier for >>> future callers to notice and handle them specially. >>> >>> In the transaction API, refuse to create or update badly named refs, >>> but allow deleting them (unless they escape refs/ and don't match >>> [A-Z_]*). >>> >>> Signed-off-by: Ronnie Sahlberg <sahlberg@xxxxxxxxxx> >>> Signed-off-by: Jonathan Nieder <jrnieder@xxxxxxxxx> >> >> Thanks. We originally threw all the different kind of breakages >> into ISBROKEN, but a ref can have a malformed name or can contain a >> bad/non value and allowing us to tell them apart is a good direction >> to go. >> >>> diff --git a/cache.h b/cache.h >>> index 5ca7f2b..0c0ac60 100644 >>> --- a/cache.h >>> +++ b/cache.h >>> @@ -978,16 +978,26 @@ extern int read_ref(const char *refname, unsigned char *sha1); >>> * If flags is non-NULL, set the value that it points to the >>> * combination of REF_ISPACKED (if the reference was found among the >>> * packed references), REF_ISSYMREF (if the initial reference was a >>> - * symbolic reference) and REF_ISBROKEN (if the ref is malformed). >>> + * symbolic reference), REF_BAD_NAME (if the reference name is ill >>> + * formed --- see RESOLVE_REF_ALLOW_BAD_NAME below), and REF_ISBROKEN >>> + * (if the ref is malformed). >> >> You want to define "is malformed" here. >> >> The original defines REF_ISBROKEN as "malformed" because >> >> (1) resolve_ref_unsafe() uses get_sha1_hex() and read_loose_refs() >> uses read_ref_full(), both to catch "malformed values" stored; >> >> (2) resolve_ref_unsafe() uses check_refname_format() and catches >> "malformed names" stored as a symref target. >> >> I _think_ you are introducing ALLOW_BAD_NAME to allow add the third >> class ".git/refs/remotes/origin/mal..formed..name". I do not know >> if they should be the same class as a symref with a good name >> ".git/refs/remotes/origin/HEAD" that points at a bad name >> "mal..formed..name", which is (2) above). Perhaps not. (2) is >> still above what is stored in the ref, and the ref in question may >> or may not have a well-formed name, which is orthogonal. >> >> So probably you left only the "value stored in the ref is malformed" >> (in other words, "we expected to find 40-hex object name but didn't >> find one") case for REF_ISBROKEN? > > I updated cache.h to try to clarify it better. > The intention here is to expand the use of REF_ISBROKEN. > > For all cases REF_ISBROKEN will be set. This includes both "the sha1 > value is bad" as well as "a name is bad". > > For those cases where a name is bad then REF_BAD_NAME is also set. Bad > names are when either the ref itself has a bad name or when a bad name > is encountered while resolving a chain of symbilic refs. > > > I.e. REF_BAD_NAME is a special case of broken ref. All REF_BAD_NAME > refs are also REF_ISBROKEN but not the reverse. Let me add some rationale why REF_BAD_NAME and REF_ISBROKEN are notorthogonal properties. I think almost all callers of these APIs are only concerned about "is the ref good or bad" and they today only check REF_ISBROKEN. I think that is a reasonable API and it allows the majority of callers to just "check this single flag". (The alternative would be to keep all callers in sync and use a set of flags for all "bad conditions".) A very small subset of callers are actually interested in knowing why the ref was bad, and in particular if the ref was bad due to a name component. Those callers, that are aware that there are different types of ISBROKEN can then inspect the REF_BAD_NAME flag in order to decide "is the ref broken due to the ref name?". This is why I made REF_BAD_NAME a special case of REF_ISBROKEN. > > >> >> Do we want to separate "value is not 40-hex" and "a symref points at >> a malformed refname" as separate "malformed value" errors? >> >>> + * RESOLVE_REF_ALLOW_BAD_NAME allows resolving refs even when their >>> + * name is invalid according to git-check-ref-format(1). If the name >>> + * is bad then the value stored in sha1 will be null_sha1 and the >>> + * REF_ISBROKEN and REF_BAD_NAME flags will be set. >> >> Viewed with that light, I am not sure if a badly named ref should >> yield null_sha1[] (REF_ISBROKEN, which I am assuming is about a >> value that is badly formatted and cannot be read, should keep >> yielding it as before). Wouldn't it make it harder for the user if >> you give null_sha1[] back to somebody who is trying to recover by >> reading "refs/heads/mal..formed", creating "refs/heads/sensible" to >> point at the same value and then removing the former? >> >> Note that I am not saying we should give back the parsed value at >> this step in the series. Perhaps there are some existing callers >> that do not check for ISBROKEN flag and instead says "null_sha1[] >> ref is to be rejected/ignored", in which case they may need to be >> corrected before that happens. Or there may be some reason I >> overlooked that makes it not so useful if we returned the object >> name stored in a ref whose name is malformed. Just wondering. > > The reason for these malformed refs resolving to null_sha1 is exactly > that. There may be callers that do not check ISBROKEN, so all those > callers need to be carefully audited before we start returning > potentially valid non null_sha1 here. > Right now I only want to do the minimal changes needed to open up only > those few paths I need to in order to allow the refs to be deleted. > For the delete case, returning the null_sha1 is sufficient. > > > Now, IF we add support to "rename a bad-ref-name to a good-ref-name", > then for that case we would need to start allowing resolve_ref_unsafe > to actually return the sha1 for the ref. > And that would require that we do those changes and audits. Given how > rare this should be, I am not convinced we need "rename bad to good" > at this point and would thus prefer to hold off adding that. > > Thus, returning the object name is not so useful for the problem I try > to solve at this stage, namely "make delete work". > > >> >>> @@ -272,6 +272,37 @@ static struct ref_dir *get_ref_dir(struct ref_entry *entry) >>> return dir; >>> } >>> >>> +static int escapes_cwd(const char *path) { >>> + char *buf; >>> + int result; >>> + >>> + if (is_absolute_path(path)) >>> + return 1; >>> + buf = xmalloc(strlen(path) + 1); >>> + result = !!normalize_path_copy(buf, path); >>> + free(buf); >>> + return result; >>> +} >> >> I think this function is misnamed for two reasons. >> >> - It does not have anything to do with cwd; it does not make any >> difference to the outcome of this function given the same input, >> if 'pwd' says "/u/jc/git" or "/u/jc/git/Documentation", no? >> >> - Even if this had something to do with cwd, I would expect a >> function whose name is escapes_cwd("/u/jc/git/Documentation") to >> yield false when 'pwd' says "/u/jc/git", but the implementation >> unconditionally rejects absolute path. In the context of the >> (sole) caller of this function, which deals with a refname "refs/...", >> it makes no sense to see an absolute path, but that does not have >> anything to do with "does this path escape cwd?", no? >> > > Agree. > I removed this function completely and just inline the !normalize_path_copy() > checks to make sure that refs/* paths remain within refs/. > > >>> +/* >>> + * Check if a refname is safe. >>> + * For refs that start with "refs/" we consider it safe as long as the rest >>> + * of the path components does not allow it to escape from this directory. >>> + * For all other refs we only consider them safe iff they only contain >>> + * upper case characters and '_'. >>> + */ >> >> I presume that the exception is to accomodate for "HEAD", "ORIG_HEAD", >> "MERGE_HEAD" and friends, but you probably do not want the readers to >> guess. >> > > I updated the commend to highlight that this is for "HEAD" and friends. > >>> +static int refname_is_safe(const char *refname) >>> +{ >>> + if (starts_with(refname, "refs/")) >>> + return !escapes_cwd(refname + strlen("refs/")); >>> + while (*refname) { >>> + if (!isupper(*refname) && *refname != '_') >>> + return 0; >>> + refname++; >>> + } >>> + return 1; >>> +} -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html