4 incomplete PGs causing RGW to go offline?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have 3 RadosGW servers running behind HAProxy to enable clients to connect to the ceph cluster like an amazon bucket.  After all the failures and upgrade issues were resolved, I cannot get the RadosGW servers to stay online.  They were upgraded to luminous, I even upgraded the OS to Ubuntu 16 on them ( before upgrading to Luminous ).  They used to have apache on them as they ran Hammer and before that firefly.  I removed apache before upgrading to Luminous.  The start up and run for about 4-6 hours before all three start to go offline.  Client traffic is light right now as we are just testing file read/write before we reactivate them ( they switched back to amazon while we fix them ). 

 

Could the 4 incomplete PGs be causing them to go offline?  The last time I saw an issue like this was when recovery wasn’t working 100%, so it seems related since they haven’t been stable since we upgraded( but that was also after the failures we had, which is why I am not trying to specifically blame the upgrade ).

 

When I look at the radosgw log, this is what I see ( the first 2 lines show up plenty before this, they are health checks by the haproxy server, the next two are file requests that 404 fail I am guessing, then the last one is me restarting the service ):

 

2018-01-11 20:14:36.640577 7f5826aa3700  1 ====== req done req=0x7f5826a9d1f0 op status=0 http_status=200 ======

2018-01-11 20:14:36.640602 7f5826aa3700  1 civetweb: 0x56202c567000: 192.168.120.21 - - [11/Jan/2018:20:14:36 +0000] "HEAD / HTTP/1.0" 1 0 - -

2018-01-11 20:14:36.640835 7f5816282700  1 ====== req done req=0x7f581627c1f0 op status=0 http_status=200 ======

2018-01-11 20:14:36.640859 7f5816282700  1 civetweb: 0x56202c610000: 192.168.120.22 - - [11/Jan/2018:20:14:36 +0000] "HEAD / HTTP/1.0" 1 0 - -

2018-01-11 20:14:36.761917 7f5835ac1700  1 ====== starting new request req=0x7f5835abb1f0 =====

2018-01-11 20:14:36.763936 7f5835ac1700  1 ====== req done req=0x7f5835abb1f0 op status=0 http_status=404 ======

2018-01-11 20:14:36.763983 7f5835ac1700  1 civetweb: 0x56202c4ce000: 192.168.120.21 - - [11/Jan/2018:20:14:36 +0000] "HEAD /Jobimages/vendor05/10/3962896/3962896_cover.pdf HTTP/1.1" 1 0 - aws-sdk-dotnet-35/2

.0.2.2 .NET Runtime/4.0 .NET Framework/4.0 OS/6.2.9200.0 FileIO

2018-01-11 20:14:36.772611 7f5808266700  1 ====== starting new request req=0x7f58082601f0 =====

2018-01-11 20:14:36.773733 7f5808266700  1 ====== req done req=0x7f58082601f0 op status=0 http_status=404 ======

2018-01-11 20:14:36.773769 7f5808266700  1 civetweb: 0x56202c6aa000: 192.168.120.21 - - [11/Jan/2018:20:14:36 +0000] "HEAD /Jobimages/vendor05/10/3962896/3962896_cover.pdf HTTP/1.1" 1 0 - aws-sdk-dotnet-35/2

.0.2.2 .NET Runtime/4.0 .NET Framework/4.0 OS/6.2.9200.0 FileIO

2018-01-11 20:14:38.163617 7f5836ac3700  1 ====== starting new request req=0x7f5836abd1f0 =====

2018-01-11 20:14:38.165352 7f5836ac3700  1 ====== req done req=0x7f5836abd1f0 op status=0 http_status=404 ======

2018-01-11 20:14:38.165401 7f5836ac3700  1 civetweb: 0x56202c4e2000: 192.168.120.21 - - [11/Jan/2018:20:14:38 +0000] "HEAD /Jobimages/vendor05/10/3445645/3445645_cover.pdf HTTP/1.1" 1 0 - aws-sdk-dotnet-35/2

.0.2.2 .NET Runtime/4.0 .NET Framework/4.0 OS/6.2.9200.0 FileIO

2018-01-11 20:14:38.170551 7f5807a65700  1 ====== starting new request req=0x7f5807a5f1f0 =====

2018-01-11 20:14:40.322236 7f58352c0700  1 ====== starting new request req=0x7f58352ba1f0 =====

2018-01-11 20:14:40.323468 7f5834abf700  1 ====== starting new request req=0x7f5834ab91f0 =====

2018-01-11 20:14:41.643365 7f58342be700  1 ====== starting new request req=0x7f58342b81f0 =====

2018-01-11 20:14:41.643358 7f58312b8700  1 ====== starting new request req=0x7f58312b21f0 =====

2018-01-11 20:14:50.324196 7f5829aa9700  1 ====== starting new request req=0x7f5829aa31f0 =====

2018-01-11 20:14:50.325622 7f58332bc700  1 ====== starting new request req=0x7f58332b61f0 =====

2018-01-11 20:14:51.645678 7f58362c2700  1 ====== starting new request req=0x7f58362bc1f0 =====

2018-01-11 20:14:51.645671 7f582e2b2700  1 ====== starting new request req=0x7f582e2ac1f0 =====

2018-01-11 20:15:00.326452 7f5815a81700  1 ====== starting new request req=0x7f5815a7b1f0 =====

2018-01-11 20:15:00.328787 7f5828aa7700  1 ====== starting new request req=0x7f5828aa11f0 =====

2018-01-11 20:15:01.648196 7f580ea73700  1 ====== starting new request req=0x7f580ea6d1f0 =====

2018-01-11 20:15:01.648698 7f5830ab7700  1 ====== starting new request req=0x7f5830ab11f0 =====

2018-01-11 20:15:10.328810 7f5832abb700  1 ====== starting new request req=0x7f5832ab51f0 =====

2018-01-11 20:15:10.329541 7f582f2b4700  1 ====== starting new request req=0x7f582f2ae1f0 =====

2018-01-11 20:15:11.650655 7f582d2b0700  1 ====== starting new request req=0x7f582d2aa1f0 =====

2018-01-11 20:15:11.651401 7f582aaab700  1 ====== starting new request req=0x7f582aaa51f0 =====

2018-01-11 20:15:20.332032 7f582c2ae700  1 ====== starting new request req=0x7f582c2a81f0 =====

2018-01-11 20:15:20.332046 7f582b2ac700  1 ====== starting new request req=0x7f582b2a61f0 =====

2018-01-11 20:15:21.653675 7f582229a700  1 ====== starting new request req=0x7f58222941f0 =====

2018-01-11 20:15:21.655867 7f5821a99700  1 ====== starting new request req=0x7f5821a931f0 =====

2018-01-11 20:15:30.334192 7f580ba6d700  1 ====== starting new request req=0x7f580ba671f0 =====

2018-01-11 20:15:30.334263 7f58252a0700  1 ====== starting new request req=0x7f582529a1f0 =====

2018-01-11 20:15:31.656023 7f582329c700  1 ====== starting new request req=0x7f58232961f0 =====

2018-01-11 20:15:31.658730 7f5825aa1700  1 ====== starting new request req=0x7f5825a9b1f0 =====

2018-01-11 20:15:40.346908 7f5827aa5700  1 ====== starting new request req=0x7f5827a9f1f0 =====

2018-01-11 20:15:40.346968 7f582429e700  1 ====== starting new request req=0x7f58242981f0 =====

2018-01-11 20:15:41.659509 7f5820296700  1 ====== starting new request req=0x7f58202901f0 =====

2018-01-11 20:15:41.661910 7f5806262700  1 ====== starting new request req=0x7f580625c1f0 =====

2018-01-11 20:15:50.339676 7f5820a97700  1 ====== starting new request req=0x7f5820a911f0 =====

2018-01-11 20:15:50.340447 7f5833abd700  1 ====== starting new request req=0x7f5833ab71f0 =====

2018-01-11 20:15:51.661637 7f581b28c700  1 ====== starting new request req=0x7f581b2861f0 =====

2018-01-11 20:15:51.665464 7f5824a9f700  1 ====== starting new request req=0x7f5824a991f0 =====

2018-01-11 20:16:00.342250 7f581fa95700  1 ====== starting new request req=0x7f581fa8f1f0 =====

2018-01-11 20:16:00.342296 7f580aa6b700  1 ====== starting new request req=0x7f580aa651f0 =====

2018-01-11 20:16:01.663620 7f581ea93700  1 ====== starting new request req=0x7f581ea8d1f0 =====

2018-01-11 20:16:01.668467 7f582a2aa700  1 ====== starting new request req=0x7f582a2a41f0 =====

2018-01-11 20:16:10.344220 7f58302b6700  1 ====== starting new request req=0x7f58302b01f0 =====

2018-01-11 20:16:10.345422 7f581ba8d700  1 ====== starting new request req=0x7f581ba871f0 =====

2018-01-11 20:16:11.664968 7f582baad700  1 ====== starting new request req=0x7f582baa71f0 =====

2018-01-11 20:16:11.671974 7f582dab1700  1 ====== starting new request req=0x7f582daab1f0 =====

2018-01-11 20:16:20.345984 7f5810276700  1 ====== starting new request req=0x7f58102701f0 =====

2018-01-11 20:16:20.346372 7f581f294700  1 ====== starting new request req=0x7f581f28e1f0 =====

2018-01-11 20:16:21.667324 7f5819a89700  1 ====== starting new request req=0x7f5819a831f0 =====

2018-01-11 20:16:21.675243 7f5823a9d700  1 ====== starting new request req=0x7f5823a971f0 =====

2018-01-11 20:16:30.347943 7f58292a8700  1 ====== starting new request req=0x7f58292a21f0 =====

2018-01-11 20:16:30.348865 7f581a28a700  1 ====== starting new request req=0x7f581a2841f0 =====

2018-01-11 20:16:31.670269 7f580f274700  1 ====== starting new request req=0x7f580f26e1f0 =====

2018-01-11 20:16:31.678598 7f5818286700  1 ====== starting new request req=0x7f58182801f0 =====

2018-01-11 20:16:40.350418 7f58272a4700  1 ====== starting new request req=0x7f582729e1f0 =====

2018-01-11 20:16:40.351565 7f582eab3700  1 ====== starting new request req=0x7f582eaad1f0 =====

2018-01-11 20:16:41.671624 7f581e292700  1 ====== starting new request req=0x7f581e28c1f0 =====

2018-01-11 20:16:41.682522 7f5819288700  1 ====== starting new request req=0x7f58192821f0 =====

2018-01-11 20:16:50.352821 7f5817a85700  1 ====== starting new request req=0x7f5817a7f1f0 =====

2018-01-11 20:16:50.357997 7f5806a63700  1 ====== starting new request req=0x7f5806a5d1f0 =====

2018-01-11 20:16:51.674867 7f581227a700  1 ====== starting new request req=0x7f58122741f0 =====

2018-01-11 20:16:51.685882 7f5811a79700  1 ====== starting new request req=0x7f5811a731f0 =====

2018-01-11 20:17:00.356027 7f5812a7b700  1 ====== starting new request req=0x7f5812a751f0 =====

2018-01-11 20:17:00.360732 7f581c28e700  1 ====== starting new request req=0x7f581c2881f0 =====

2018-01-11 20:17:01.678524 7f5815280700  1 ====== starting new request req=0x7f581527a1f0 =====

2018-01-11 20:17:01.689199 7f5816a83700  1 ====== starting new request req=0x7f5816a7d1f0 =====

2018-01-11 20:17:10.358813 7f580fa75700  1 ====== starting new request req=0x7f580fa6f1f0 =====

2018-01-11 20:17:10.363121 7f581da91700  1 ====== starting new request req=0x7f581da8b1f0 =====

2018-01-11 20:17:11.682017 7f581427e700  1 ====== starting new request req=0x7f58142781f0 =====

2018-01-11 20:17:11.693168 7f5811278700  1 ====== starting new request req=0x7f58112721f0 =====

2018-01-11 20:17:20.366413 7f5809a69700  1 ====== starting new request req=0x7f5809a631f0 =====

2018-01-11 20:17:20.366555 7f5821298700  1 ====== starting new request req=0x7f58212921f0 =====

2018-01-11 20:17:21.684856 7f580ca6f700  1 ====== starting new request req=0x7f580ca691f0 =====

2018-01-11 20:17:21.696645 7f5813a7d700  1 ====== starting new request req=0x7f5813a771f0 =====

2018-01-11 20:17:30.366328 7f580a26a700  1 ====== starting new request req=0x7f580a2641f0 =====

2018-01-11 20:17:30.366715 7f5826aa3700  1 ====== starting new request req=0x7f5826a9d1f0 =====

2018-01-11 20:17:31.687722 7f5816282700  1 ====== starting new request req=0x7f581627c1f0 =====

2018-01-11 20:17:31.700560 7f5809268700  1 ====== starting new request req=0x7f58092621f0 =====

2018-01-11 20:17:40.369569 7f5835ac1700  1 ====== starting new request req=0x7f5835abb1f0 =====

2018-01-11 20:17:40.369956 7f5808266700  1 ====== starting new request req=0x7f58082601f0 =====

2018-01-11 20:17:41.689913 7f5836ac3700  1 ====== starting new request req=0x7f5836abd1f0 =====

2018-01-11 22:17:14.888135 7f5838ac7700 -1 received  signal: Terminated from  PID: 1 task name: /sbin/init  UID: 0

2018-01-11 22:17:14.888161 7f5838ac7700  1 handle_sigterm

2018-01-11 22:17:14.888198 7f5838ac7700  1 handle_sigterm set alarm for 120

2018-01-11 22:17:14.888209 7f58698ebe80 -1 shutting down

2018-01-11 22:18:45.116476 7f5987be2e80  0 deferred set uid:gid to 64045:64045 (ceph:ceph)

2018-01-11 22:18:45.116716 7f5987be2e80  0 ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process (unknown), pid 38132

2018-01-11 22:18:45.258934 7f5987be2e80  0 starting handler: civetweb

2018-01-11 22:18:45.266871 7f5987be2e80  1 mgrc service_daemon_register rgw.radosgw1 metadata {arch=x86_64,ceph_version=ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable),cpu=

Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz,distro=ubuntu,distro_description=Ubuntu 16.04.3 LTS,distro_version=16.04,frontend_config#0=civetweb port=80,frontend_type#0=civetweb,hostname=ukradosgw1,kernel

_description=#127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017,kernel_version=4.4.0-104-generic,mem_swap_kb=12580860,mem_total_kb=12286220,num_handles=1,os=Linux,pid=38132,zone_id=default,zone_name=default,zonegr

oup_id=default,zonegroup_name=default}

 

Its like the service stops responding…

 

-Brent

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux