On 2012.7.25 12:14 AM, Junio C Hamano wrote: >> Nothing, because paths are not URI escaped. :) >> >> You probably meant svn_uri_canonicalize(). And no, it does not double escape, >> so its safe to escape as early as possible. > > Are you saying that the function assumes that a local pathname would ^^^^^^^^ URI and path canonicalization are done differently and by different functions. svn_uri_canonicalize() vs svn_dirent_canonicalize(). Or maybe you're referring to the path portion of the URL? I don't think that makes a difference for what you're asking, but its important to keep in mind. > not have '%' in it, returns its input as-is when it sees one, and if > the caller really needs to express a path with '%' in it, it is the > responsibility of the caller to escape it? It appears that if the % is followed by hex it assumes its an escape. Otherwise it escapes it. Thus... http://www.google.com/per%%nt -> http://www.google.com/per%25%25nt http://www.google.com/per%ant -> http://www.google.com/per%25ant http://www.google.com/per%cent -> http://www.google.com/per%CEnt Which makes sense if the idea is to not double escape. > That makes it even more confusing.... Straight out of the RFC. http://pretty-rfc.herokuapp.com/RFC3986#when-to-percent-encode Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent- encoding, or vice versa in the case of percent-encoding an already percent-encoded string. It makes it far simpler to use. You can't read the mind of the user, but its a fair guess that they're not really thinking too deeply about how escaping works. It makes URI and path canonicalization safer and simpler. Otherwise you'd need to keep track of whether a thing was already escaped or not! Just begging for loads of bugs. (If SVN were using URI and path objects they'd just take care of it and none of this would be a problem in the first place). This way you have no double escaping concerns. No need to track if a thing is already canonicalized. Do it as often and as early as you like. Making a corner case a little harder is a small price to pay for making the common case much, much easier. This also appears to be what Firefox does. >> my $uri = "http://www.example.com/ foo"; >> >> print SVN::_Core::svn_uri_canonicalize( >> SVN::_Core::svn_uri_canonicalize($uri) >> ); >> >> That produces "http://www.example.com/%20foo". > > In other words, if your DocumentRoot was /var/www and you have a > directory /var/www/per%cent you want to expose to the outside world, > you have to say "http://www.example.com/per%25cent" yourself and the > "canonicalize" function will be an identity function? Yes. It can be made to work better. There's a number of places in the code which effectively do this: my $full_url = $url . '/' . $path; And I was canonicalizing them like this: my $full_url = canonicalize_url($url . '/' . $path); I'd been pondering whether it would be worthwhile to have a function which added a path to a base URL and canonicalized. Now I see that yes, it would be to deal with this corner case. my $full_url = append_path_to_url($url, $path); That would properly URI encode any % in $path before appending and then canonicalizing the whole thing as a URI. I'm pretty sure the code in master doesn't handle this at all. > I have this vague suspicion that Jonathan was asking about what your > Git::SVN::Utils::canonicalize_path() sub does, so all of the above > might be moot, though... Its just a pass through to the SVN API. -- 44. I am not the atheist chaplain. -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army http://skippyslist.com/list/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html