Problem with RadosGW and special characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've tracked this back to the following commit:

commit fa8b9b971453e960062a7e677bb09a7849e59744
Author: Greg Farnum <gregf at hq.newdream.net>
Date:   Fri Apr 2 13:14:12 2010 -0700

     rgw: convert + to space in url_decode

diff --git a/src/rgw/rgw_common.cc b/src/rgw/rgw_common.cc
index 6330fe2..da9debc 100644
--- a/src/rgw/rgw_common.cc
+++ b/src/rgw/rgw_common.cc
@@ -122,7 +122,12 @@ bool url_decode(string& src_str, string& dest_str)

    while (*src) {
      if (*src != '%') {
-      dest[pos++] = *src++;
+      if (*src != '+') {
+       dest[pos++] = *src++;
+      } else {
+       dest[pos++] = ' ';
+       ++src;
+      }
      } else {
        src++;
        char c1 = hex_to_num(*src++);


Though, I'm not sure why this was implemented. I would guess that this 
function needs to deal with URL parameters as well as file paths, but I 
don't understand the code enough to tell.

On 6/30/2014 5:41 PM, Brian Rak wrote:
> Just for reference, I've opened http://tracker.ceph.com/issues/8702
>
> On 6/26/2014 10:18 PM, Brian Rak wrote:
>> My current workaround plan is to just upload both versions of the 
>> file... I think this is probably the simplest solution with the least 
>> possibility of breaking later on.
>>
>> On 6/26/2014 6:35 PM, Craig Lewis wrote:
>>> Note that wget did URL encode the space ("test file" became 
>>> "test%20file"), because it knows that a space is never valid.  It 
>>> can't know if you meant an actual plus, or a encoded space in 
>>> "test+file", so it left it alone.
>>>
>>> I will say that I would prefer that the + be left alone.  If I have 
>>> a static "test+file", Apache will serve that static file correctly.
>>>
>>>
>>>
>>> How badly do you need this to work, right now?  If you need it now, 
>>> I can suggest a work around.  This is dirty hack, and I'm not saying 
>>> it's a good idea.  It's more of a thought exercise.
>>>
>>> A quick google indicates that mod_rewrite might help: 
>>> http://stackoverflow.com/questions/459667/how-to-encode-special-characters-using-mod-rewrite-apache 
>>> .
>>>
>>> But that might make the problem worse for other characters... If it 
>>> does, I'm sure I could get it working by installing an Apache hook. 
>>>  Off the top of my head, I'd try a hook in 
>>> http://perl.apache.org/docs/2.0/user/handlers/http.html#PerlFixupHandler to 
>>> replace all + characters with the correct escape sequence, %2B. I 
>>> know mod_python can hook into Apache too.  I don't know if nginx has 
>>> a similar capability.
>>>
>>>
>>> As with all dirty hacks, if you actually implement it, you'll want 
>>> to watch the release notes.  Once you work around a bug, someone 
>>> will fix the bug and break your hack.
>>>
>>>
>>>
>>>
>>> On Thu, Jun 26, 2014 at 8:54 AM, Brian Rak <brak at gameservers.com 
>>> <mailto:brak at gameservers.com>> wrote:
>>>
>>>     Going back to my first post, I linked to this
>>>     http://stackoverflow.com/questions/1005676/urls-and-plus-signs
>>>
>>>     Per the defintion of application/x-www-form-urlencoded:
>>>     http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1
>>>
>>>     "Control names and values are escaped. Space characters are
>>>     replaced by`+', and then reserved characters are escaped as
>>>     described in[RFC1738]
>>>     <http://www.w3.org/TR/html401/references.html#ref-RFC1738>,"
>>>
>>>     The whole +=space thing is only for the query portion of the
>>>     URL, not the filename.
>>>
>>>     I've done some testing with nginx, and this is how it behaves:
>>>
>>>     On the server, somewhere in the webroot:
>>>
>>>     echo space > "test file"
>>>
>>>     Then, from a client:
>>>     $ wget --spider "http://example.com/test/test file"
>>>     <http://example.com/test/testfile>
>>>
>>>     Spider mode enabled. Check if remote file exists.
>>>     --2014-06-26 11:46:54-- http://example.com/test/test%20file
>>>     Connecting to example.com:80... connected.
>>>     HTTP request sent, awaiting response... 200 OK
>>>     Length: 6 [application/octet-stream]
>>>     Remote file exists.
>>>
>>>     $ wget --spider "http://example.com/test/test+file";
>>>     <http://example.com/test/test+file>
>>>
>>>     Spider mode enabled. Check if remote file exists.
>>>     --2014-06-26 11:46:57-- http://example.com/test/test+file
>>>     Connecting to example.com:80... connected.
>>>     HTTP request sent, awaiting response... 404 Not Found
>>>
>>>     Remote file does not exist -- broken link!!!
>>>
>>>     These tests were done just with the standard filesystem.  I
>>>     wasn't using radosgw for this.  Feel free to repeat with the web
>>>     server of your choice, you'll find the same thing happens.
>>>
>>>     URL decoding the path is not the correct behavior.
>>>
>>>
>>>
>>>     On 6/26/2014 11:36 AM, Sylvain Munaut wrote:
>>>>     Hi,
>>>>
>>>>>     Based on the debug log, radosgw is definitely the software that's
>>>>>     incorrectly parsing the URL.  For example:
>>>>>
>>>>>
>>>>>     2014-06-25 17:30:37.383134 7f7c6cfa9700 20
>>>>>     REQUEST_URI=/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb
>>>>>     2014-06-25 17:30:37.383199 7f7c6cfa9700 10
>>>>>     s->object=ubuntu/pool/main/a/adduser/adduser_3.113 nmu3ubuntu3_all.deb
>>>>>     s->bucket=ubuntu
>>>>>
>>>>>     I'll dig into this some more, but it definitely looks like radosgw is the
>>>>>     one that's unencoding the + character here.  How else would it be receiving
>>>>>     the request_uri with the + in it, but then a little bit later the request
>>>>>     has a space in it instead?
>>>>     Note that AFAIK, in fastcgi, REQUEST_URI is _supposed_ to be an URL
>>>>     encoded version and should be URL-decoded by the fastcgi handler. So
>>>>     converting the + to ' ' seems valid to me.
>>>>
>>>>
>>>>     Cheers,
>>>>
>>>>         Sylvain
>>>
>>>
>>>     _______________________________________________
>>>     ceph-users mailing list
>>>     ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
>>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140714/d7137efe/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux