Re: radosgw daemon stalls on download of some files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Our conf :
server {
        listen  80;
        listen  [::]:80;

        server_name     radosgw-prod;

        client_max_body_size 1000m;
        error_log   /var/log/nginx/radosgw-prod-error.log;
        access_log  off;


        location / {
                fastcgi_pass_header     Authorization;
                fastcgi_pass_request_headers on;

                if ($request_method  = PUT ) {
                        rewrite ^       /PUT$request_uri;
                }

                include fastcgi_params;
                client_max_body_size    0;

                fastcgi_busy_buffers_size 512k;
                fastcgi_buffer_size 512k;
                fastcgi_buffers 16 512k;
                fastcgi_read_timeout 2s;
                fastcgi_send_timeout 1s;
                fastcgi_connect_timeout 1s;


                fastcgi_next_upstream error timeout http_500 http_503;
                fastcgi_pass ceph-rgw;
        }

        location /PUT/ {
                internal;
                fastcgi_pass_header     Authorization;
                fastcgi_pass_request_headers on;

                include fastcgi_params;
                client_max_body_size    0;
                fastcgi_param  CONTENT_LENGTH   $content_length;

                fastcgi_busy_buffers_size 512k;
                fastcgi_buffer_size 512k;
                fastcgi_buffers 16 512k;

                fastcgi_pass ceph-rgw;
        }
}


Content-Length is only sent with PUT request because there was an issue with older version of the radosgateway.

DON'T activate keep alive, connection are not closed on the radosgw side when the keep alive option is activated, leading to too much connection open on the rgw.
We use this configuration with a tcp socket and not with a local one.

-----Original Message-----
From: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Sebastian
Sent: mercredi 4 décembre 2013 10:29
To: ceph-users
Subject: Re:  radosgw daemon stalls on download of some files

Hi,

we are currently using the patched fastcgi version (2.4.7-0910042141-6-gd4fffda) Updating to a more recent version is currently blocked by http://tracker.ceph.com/issues/6453

Is there a documentation for running radosgw with nginx? I only find some mailinglist posts with some config snippets. 

Sebastian

On 30.11.2013, at 20:46, Andrew Woodward wrote:

> Are you using the  inktank patched FastCGI sever? 
> http://gitbuilder.ceph.com
> 
> Alternately try another script sever like ngnix as already suggested.
> 
> On Nov 29, 2013 12:23 PM, "German Anders" <ganders@xxxxxxxxxxxx> wrote:
> Thanks a lot Sebastian, i'm going to try that, also i'm having an issue while trying to test a rbd creation, i've install in the deploy server the ceph-client:
> 
> ceph@ceph-deploy01:/etc/ceph$ sudo rbd -n client.ceph-test -k 
> /home/ceph/ceph-cluster/ceph.client.admin.keyring create --size 10240 
> cephdata
> 2013-11-29 15:20:25.683930 7fcd9979c780  0 librados: 
> client.ceph-openstack authentication error (1) Operation not permitted
> rbd: couldn't connect to the cluster!
> 
>  Anyone know what could be the issue here? maybe it has something to do with keys or maybe not...
> 
> Thanks in advance,
> 
> Best regards,
>  
> German Anders
> 
> 
> 
> 
> 
> 
>  
>> --- Original message ---
>> Asunto: Re:  radosgw daemon stalls on download of some 
>> files
>> De: Sebastian <webmaster@xxxxxxxx>
>> Para: ceph-users <ceph-users@xxxxxxxxxxxxxx>
>> Fecha: Friday, 29/11/2013 16:18
>> 
>> Hi Yehuda,
>> 
>> 
>>> It's interesting, the responses are received but seems that they 
>>> aren't being handled (hence the following pings). There are a few 
>>> things that you could look at. First, try to connect to the admin 
>>> socket and see if you get any useful information from there. This 
>>> could include in-flight requests, look for other requests that have 
>>> not completed. Also see if there's indication for requests throttling.
>> 
>> Do you refer to the methods mentioned here? http://ceph.com/docs/dumpling/radosgw/troubleshooting/?
>> Unfortunately the socket file is not present. Do i have to activate it in the config somehow? I could not find any reference to that in the docs. Is it already included in my radosgw version?
>> radosgw -v
>> ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>> 
>>> Another thing to look at would be at the seemingly unrelated timeout 
>>> messages. These should not happen and might indicate that there's 
>>> something that is holding you up that shouldn't. Try searching for 
>>> the same thread id that is specified in these messages (omit the 0x 
>>> prefix), and see what's the last thing that it's doing.
>> 
>> I checked that: 
>> http://pastebin.com/Z23PWwjt
>> i do not see anything unusual before the messages happen, but maybe you see something odd. 
>> 
>> 
>>> You could also try turning on also 'debug objecter = 20', see if it 
>>> provides more info (it's very verbose though).
>>> 
>> 
>> Did that, but that is way to verbose for me ;) I uploaded it here:
>> http://pastebin.com/VBPAVP6z
>> There might be some requests mixed into it, but the one for cdn/52974400c6dd6ca719000004/source.avi is the one that stalled. 
>> 
>>> How much are you loading the gateway before that happens? We've seen 
>>> a similar issue in the past that was related to the fcgi library 
>>> that is dynamically linked with the radosgw process (that is, not 
>>> the apache mod_fastcgi module). This, however, would only happen 
>>> when there's heavy load and the fd numbers handled by the radosgw 
>>> surpassed 1024 (buggy library that was using select() instead of poll()).
>> 
>> There are not that many requests on the Storage, maybe 10-20 req/min. The cluster serves as a source for a CDN, so once the resource is fetched it should not be fetched again soon. I checked for the open files, and there are only about 10-20 open file handles for the radosgw process. So this probably is not the issue. 
>> 
>> Sebastian
>> 
>> 
>>> 
>>> Yehuda
>>> 
>>> On Fri, Nov 29, 2013 at 7:28 AM, Sebastian <webmaster@xxxxxxxx> wrote:
>>>> Hi,
>>>> 
>>>> thanks for the hint. I tried this again and noticed that the time out message does seem to be unrelated. Here is the log file for a stalling request with debug turned on:
>>>> http://pastebin.com/DcQuc9wP
>>>> 
>>>> I really cannot really find a real "error" in the log. The download stalls at about 500kb at that point though. Restarting radosgw fixes it for 1 download only, the next one is broken again. But as i said this does not happen for all files.
>>>> 
>>>> Sebastian
>>>> 
>>>> On 27.11.2013, at 21:53, Yehuda Sadeh wrote:
>>>> 
>>>>> On Wed, Nov 27, 2013 at 4:46 AM, Sebastian <webmaster@xxxxxxxx> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> we have a setup of 4 Servers running ceph and radosgw. We use it as an internal S3 service for our files. The Servers run Debian Squeeze with Ceph 0.67.4.
>>>>>> 
>>>>>> The cluster has been running smoothly for quite a while, but we are currently experiencing issues with the radosgw. For some files the HTTP Download just stalls at around 500kb.
>>>>>> 
>>>>>> The Apache error log just says:
>>>>>> [error] [client ] FastCGI: comm with server "/var/www/s3gw.fcgi" 
>>>>>> aborted: idle timeout (30 sec) [error] [client ] Handler for 
>>>>>> fastcgi-script returned invalid result code 1
>>>>>> 
>>>>>> radosgw logging:
>>>>>> 7f00bc66a700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 
>>>>>> 0x7f00934bb700' had timed out after 600
>>>>>> 7f00bc66a700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 
>>>>>> 0x7f00ab4eb700' had timed out after 600
>>>>>> 
>>>>>> The interesting thing is that the cluster health is fine an only some files are not working properly. Most of them just work fine. A restart of radosgw fixes the issue. The other ceph logs are also clean.
>>>>>> 
>>>>>> Any idea why this happens?
>>>>>> 
>>>>> 
>>>>> No, but you can turn on 'debug ms = 1' on your gateway ceph.conf, 
>>>>> and that might give some better indication.
>>>>> 
>>>>> Yehuda
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux