I've tracked this back to the following commit: commit fa8b9b971453e960062a7e677bb09a7849e59744 Author: Greg Farnum <gregf at hq.newdream.net> Date: Fri Apr 2 13:14:12 2010 -0700 rgw: convert + to space in url_decode diff --git a/src/rgw/rgw_common.cc b/src/rgw/rgw_common.cc index 6330fe2..da9debc 100644 --- a/src/rgw/rgw_common.cc +++ b/src/rgw/rgw_common.cc @@ -122,7 +122,12 @@ bool url_decode(string& src_str, string& dest_str) while (*src) { if (*src != '%') { - dest[pos++] = *src++; + if (*src != '+') { + dest[pos++] = *src++; + } else { + dest[pos++] = ' '; + ++src; + } } else { src++; char c1 = hex_to_num(*src++); Though, I'm not sure why this was implemented. I would guess that this function needs to deal with URL parameters as well as file paths, but I don't understand the code enough to tell. On 6/30/2014 5:41 PM, Brian Rak wrote: > Just for reference, I've opened http://tracker.ceph.com/issues/8702 > > On 6/26/2014 10:18 PM, Brian Rak wrote: >> My current workaround plan is to just upload both versions of the >> file... I think this is probably the simplest solution with the least >> possibility of breaking later on. >> >> On 6/26/2014 6:35 PM, Craig Lewis wrote: >>> Note that wget did URL encode the space ("test file" became >>> "test%20file"), because it knows that a space is never valid. It >>> can't know if you meant an actual plus, or a encoded space in >>> "test+file", so it left it alone. >>> >>> I will say that I would prefer that the + be left alone. If I have >>> a static "test+file", Apache will serve that static file correctly. >>> >>> >>> >>> How badly do you need this to work, right now? If you need it now, >>> I can suggest a work around. This is dirty hack, and I'm not saying >>> it's a good idea. It's more of a thought exercise. >>> >>> A quick google indicates that mod_rewrite might help: >>> http://stackoverflow.com/questions/459667/how-to-encode-special-characters-using-mod-rewrite-apache >>> . >>> >>> But that might make the problem worse for other characters... If it >>> does, I'm sure I could get it working by installing an Apache hook. >>> Off the top of my head, I'd try a hook in >>> http://perl.apache.org/docs/2.0/user/handlers/http.html#PerlFixupHandler to >>> replace all + characters with the correct escape sequence, %2B. I >>> know mod_python can hook into Apache too. I don't know if nginx has >>> a similar capability. >>> >>> >>> As with all dirty hacks, if you actually implement it, you'll want >>> to watch the release notes. Once you work around a bug, someone >>> will fix the bug and break your hack. >>> >>> >>> >>> >>> On Thu, Jun 26, 2014 at 8:54 AM, Brian Rak <brak at gameservers.com >>> <mailto:brak at gameservers.com>> wrote: >>> >>> Going back to my first post, I linked to this >>> http://stackoverflow.com/questions/1005676/urls-and-plus-signs >>> >>> Per the defintion of application/x-www-form-urlencoded: >>> http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1 >>> >>> "Control names and values are escaped. Space characters are >>> replaced by`+', and then reserved characters are escaped as >>> described in[RFC1738] >>> <http://www.w3.org/TR/html401/references.html#ref-RFC1738>," >>> >>> The whole +=space thing is only for the query portion of the >>> URL, not the filename. >>> >>> I've done some testing with nginx, and this is how it behaves: >>> >>> On the server, somewhere in the webroot: >>> >>> echo space > "test file" >>> >>> Then, from a client: >>> $ wget --spider "http://example.com/test/test file" >>> <http://example.com/test/testfile> >>> >>> Spider mode enabled. Check if remote file exists. >>> --2014-06-26 11:46:54-- http://example.com/test/test%20file >>> Connecting to example.com:80... connected. >>> HTTP request sent, awaiting response... 200 OK >>> Length: 6 [application/octet-stream] >>> Remote file exists. >>> >>> $ wget --spider "http://example.com/test/test+file" >>> <http://example.com/test/test+file> >>> >>> Spider mode enabled. Check if remote file exists. >>> --2014-06-26 11:46:57-- http://example.com/test/test+file >>> Connecting to example.com:80... connected. >>> HTTP request sent, awaiting response... 404 Not Found >>> >>> Remote file does not exist -- broken link!!! >>> >>> These tests were done just with the standard filesystem. I >>> wasn't using radosgw for this. Feel free to repeat with the web >>> server of your choice, you'll find the same thing happens. >>> >>> URL decoding the path is not the correct behavior. >>> >>> >>> >>> On 6/26/2014 11:36 AM, Sylvain Munaut wrote: >>>> Hi, >>>> >>>>> Based on the debug log, radosgw is definitely the software that's >>>>> incorrectly parsing the URL. For example: >>>>> >>>>> >>>>> 2014-06-25 17:30:37.383134 7f7c6cfa9700 20 >>>>> REQUEST_URI=/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb >>>>> 2014-06-25 17:30:37.383199 7f7c6cfa9700 10 >>>>> s->object=ubuntu/pool/main/a/adduser/adduser_3.113 nmu3ubuntu3_all.deb >>>>> s->bucket=ubuntu >>>>> >>>>> I'll dig into this some more, but it definitely looks like radosgw is the >>>>> one that's unencoding the + character here. How else would it be receiving >>>>> the request_uri with the + in it, but then a little bit later the request >>>>> has a space in it instead? >>>> Note that AFAIK, in fastcgi, REQUEST_URI is _supposed_ to be an URL >>>> encoded version and should be URL-decoded by the fastcgi handler. So >>>> converting the + to ' ' seems valid to me. >>>> >>>> >>>> Cheers, >>>> >>>> Sylvain >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >> > > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140714/d7137efe/attachment.htm>