Re: [PATCH v2] fetch: Strip usernames from url's before storing them

Andreas Ericsson <ae@xxxxxx> · Wed, 15 Apr 2009 22:45:14 +0200

Thanks for the feedback. Many appreciated.

Junio C Hamano wrote:
Andreas Ericsson <ae@xxxxxx> writes:

+/*
+ * Strip username information from the url and return it in a
+ * newly allocated string which the caller has to free.
+ *
+ * The url's we want to catch are the following:
+ *   ssh://[user@]host.xz[:port]/path/to/repo.git/
+ *   [user@]host.xz:/path/to/repo.git/
+ *   http[s]://[user[:password]@]host.xz/path/to/repo.git

If this is a valid URL:

	scheme://host.xz/path@with@at@xxxxxxxx/

we do not want to mistakenly trigger this logic.

I'm guessing, and I'm slightly inebriated so don't yell too hard at me if
you think your mail will hit me in the morning ;-)

It *is* valid for "bare ssh" url's, but that would only trigger the fourth
(and last) case of the if() else if() thing which would return a strdup().
It won't have a scheme, and it won't have a colon after the @-sign.

An url with a scheme can't contain a colon *before* the at-sign if it does
contain a username, and it won't contain a colon *after* the @-sign if it
*does* contain a username. This is a guess, but I wrote that part of the
transport code initially, so I've got a decent inkling of what git accepts
in the first place (although it might not be strong enough in the face of
the various RFC's, I know).

I do not know if rsync://me@there/path is supported, but we should
generalize to support any scheme://me@there/path to keep the code simpler.
You do not do anything special based on the URL scheme other than learning
how long the scheme:// part is to copy it anyway.

I've never seen rsync://user@ style url's before. It's trivial to add support
for it if it's valid though, but for the reasons above, I'm not too fussed
about trying to support URL's we're extremely unlikely to see in real life.
The last else if() really does catch a lot.

 Perhaps like...

char *transport_anonymize_url(const char *url)
{
	char *anon_url, *scheme_prefix, *anon_part;
	size_t len, prefix_len = 0;

	anon_part = strchr(url, '@');
	if (is_local(url) || !anon_part)
		goto literal_copy;

	anon_part++;
	scheme_prefix = strstr(url, "://");
	if (scheme_prefix) {
		const char *cp;
		/* make sure scheme is reasonable */
		for (cp = url; cp < scheme_prefix; cp++) {
			switch (*cp) { /* RFC 1738 2.1 */
			case '+':
			case '.':
			case '-':
				break; /* ok */

Isn't %xx also a valid escape sequence for unknown chars in url's, according
to the aforementioned RFC?

			default:
				if (isalnum(*cp))
					break;
				/* it isn't */
				goto literal_copy;
			}
		}
		/* @ past the first slash does not count */
		cp = strchr(scheme_prefix + 3, '/');
		if (cp < anon_part)
			goto literal_copy;
		prefix_len = scheme_prefix - url + 3;
	}
	else if (!strchr(anon_part, ':'))
		/* cannot be "me@there:/path/name" */

Nice, but the comment is misleading to the simple (or drunk) mind.
It can't contain an @-sign at all due to the check above, which I'd
be happier if it mentions (me being both atm, I'm inclined to think
in rather straight lines).

Otherwise I really like it.

		goto literal_copy;
	len = prefix_len + strlen(anon_part);
	anon_url = xmalloc(len + 1);
	memcpy(anon_url, url, prefix_len);
	memcpy(anon_url + prefix_len, anon_part, strlen(anon_part));
	return anon_url;
 literal_copy:
	return xstrdup(url);
}

This looks sensible and fairly generic, and I'll happily defer to a
more capable and less drunk programmer any time of the day. I'll copy
it into anon-url.c tomorrow and see how it pans out with some of the
weirder URL's you gave today.

If anyone's got some tricky url's they want me to try, email me
before 08:00 GMT tomorrow and I'll give 'em a whirl with multiple
algo's.

--
Andreas Ericsson                   andreas.ericsson@xxxxxx
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html