We have 3 RadosGW servers running behind HAProxy to enable clients to connect to the ceph cluster like an amazon bucket. After all the failures and upgrade issues were resolved, I cannot get the RadosGW servers to stay online. They were upgraded to luminous, I even upgraded the OS to Ubuntu 16 on them ( before upgrading to Luminous ). They used to have apache on them as they ran Hammer and before that firefly. I removed apache before upgrading to Luminous. The start up and run for about 4-6 hours before all three start to go offline. Client traffic is light right now as we are just testing file read/write before we reactivate them ( they switched back to amazon while we fix them ). Could the 4 incomplete PGs be causing them to go offline? The last time I saw an issue like this was when recovery wasn’t working 100%, so it seems related since they haven’t been stable since we upgraded( but that was also after the failures we had, which is why I am not trying to specifically blame the upgrade ). When I look at the radosgw log, this is what I see ( the first 2 lines show up plenty before this, they are health checks by the haproxy server, the next two are file requests that 404 fail I am guessing, then the last one is me restarting the service ): 2018-01-11 20:14:36.640577 7f5826aa3700 1 ====== req done req=0x7f5826a9d1f0 op status=0 http_status=200 ====== 2018-01-11 20:14:36.640602 7f5826aa3700 1 civetweb: 0x56202c567000: 192.168.120.21 - - [11/Jan/2018:20:14:36 +0000] "HEAD / HTTP/1.0" 1 0 - - 2018-01-11 20:14:36.640835 7f5816282700 1 ====== req done req=0x7f581627c1f0 op status=0 http_status=200 ====== 2018-01-11 20:14:36.640859 7f5816282700 1 civetweb: 0x56202c610000: 192.168.120.22 - - [11/Jan/2018:20:14:36 +0000] "HEAD / HTTP/1.0" 1 0 - - 2018-01-11 20:14:36.761917 7f5835ac1700 1 ====== starting new request req=0x7f5835abb1f0 ===== 2018-01-11 20:14:36.763936 7f5835ac1700 1 ====== req done req=0x7f5835abb1f0 op status=0 http_status=404 ====== 2018-01-11 20:14:36.763983 7f5835ac1700 1 civetweb: 0x56202c4ce000: 192.168.120.21 - - [11/Jan/2018:20:14:36 +0000] "HEAD /Jobimages/vendor05/10/3962896/3962896_cover.pdf HTTP/1.1" 1 0 - aws-sdk-dotnet-35/2 .0.2.2 .NET Runtime/4.0 .NET Framework/4.0 OS/6.2.9200.0 FileIO 2018-01-11 20:14:36.772611 7f5808266700 1 ====== starting new request req=0x7f58082601f0 ===== 2018-01-11 20:14:36.773733 7f5808266700 1 ====== req done req=0x7f58082601f0 op status=0 http_status=404 ====== 2018-01-11 20:14:36.773769 7f5808266700 1 civetweb: 0x56202c6aa000: 192.168.120.21 - - [11/Jan/2018:20:14:36 +0000] "HEAD /Jobimages/vendor05/10/3962896/3962896_cover.pdf HTTP/1.1" 1 0 - aws-sdk-dotnet-35/2 .0.2.2 .NET Runtime/4.0 .NET Framework/4.0 OS/6.2.9200.0 FileIO 2018-01-11 20:14:38.163617 7f5836ac3700 1 ====== starting new request req=0x7f5836abd1f0 ===== 2018-01-11 20:14:38.165352 7f5836ac3700 1 ====== req done req=0x7f5836abd1f0 op status=0 http_status=404 ====== 2018-01-11 20:14:38.165401 7f5836ac3700 1 civetweb: 0x56202c4e2000: 192.168.120.21 - - [11/Jan/2018:20:14:38 +0000] "HEAD /Jobimages/vendor05/10/3445645/3445645_cover.pdf HTTP/1.1" 1 0 - aws-sdk-dotnet-35/2 .0.2.2 .NET Runtime/4.0 .NET Framework/4.0 OS/6.2.9200.0 FileIO 2018-01-11 20:14:38.170551 7f5807a65700 1 ====== starting new request req=0x7f5807a5f1f0 ===== 2018-01-11 20:14:40.322236 7f58352c0700 1 ====== starting new request req=0x7f58352ba1f0 ===== 2018-01-11 20:14:40.323468 7f5834abf700 1 ====== starting new request req=0x7f5834ab91f0 ===== 2018-01-11 20:14:41.643365 7f58342be700 1 ====== starting new request req=0x7f58342b81f0 ===== 2018-01-11 20:14:41.643358 7f58312b8700 1 ====== starting new request req=0x7f58312b21f0 ===== 2018-01-11 20:14:50.324196 7f5829aa9700 1 ====== starting new request req=0x7f5829aa31f0 ===== 2018-01-11 20:14:50.325622 7f58332bc700 1 ====== starting new request req=0x7f58332b61f0 ===== 2018-01-11 20:14:51.645678 7f58362c2700 1 ====== starting new request req=0x7f58362bc1f0 ===== 2018-01-11 20:14:51.645671 7f582e2b2700 1 ====== starting new request req=0x7f582e2ac1f0 ===== 2018-01-11 20:15:00.326452 7f5815a81700 1 ====== starting new request req=0x7f5815a7b1f0 ===== 2018-01-11 20:15:00.328787 7f5828aa7700 1 ====== starting new request req=0x7f5828aa11f0 ===== 2018-01-11 20:15:01.648196 7f580ea73700 1 ====== starting new request req=0x7f580ea6d1f0 ===== 2018-01-11 20:15:01.648698 7f5830ab7700 1 ====== starting new request req=0x7f5830ab11f0 ===== 2018-01-11 20:15:10.328810 7f5832abb700 1 ====== starting new request req=0x7f5832ab51f0 ===== 2018-01-11 20:15:10.329541 7f582f2b4700 1 ====== starting new request req=0x7f582f2ae1f0 ===== 2018-01-11 20:15:11.650655 7f582d2b0700 1 ====== starting new request req=0x7f582d2aa1f0 ===== 2018-01-11 20:15:11.651401 7f582aaab700 1 ====== starting new request req=0x7f582aaa51f0 ===== 2018-01-11 20:15:20.332032 7f582c2ae700 1 ====== starting new request req=0x7f582c2a81f0 ===== 2018-01-11 20:15:20.332046 7f582b2ac700 1 ====== starting new request req=0x7f582b2a61f0 ===== 2018-01-11 20:15:21.653675 7f582229a700 1 ====== starting new request req=0x7f58222941f0 ===== 2018-01-11 20:15:21.655867 7f5821a99700 1 ====== starting new request req=0x7f5821a931f0 ===== 2018-01-11 20:15:30.334192 7f580ba6d700 1 ====== starting new request req=0x7f580ba671f0 ===== 2018-01-11 20:15:30.334263 7f58252a0700 1 ====== starting new request req=0x7f582529a1f0 ===== 2018-01-11 20:15:31.656023 7f582329c700 1 ====== starting new request req=0x7f58232961f0 ===== 2018-01-11 20:15:31.658730 7f5825aa1700 1 ====== starting new request req=0x7f5825a9b1f0 ===== 2018-01-11 20:15:40.346908 7f5827aa5700 1 ====== starting new request req=0x7f5827a9f1f0 ===== 2018-01-11 20:15:40.346968 7f582429e700 1 ====== starting new request req=0x7f58242981f0 ===== 2018-01-11 20:15:41.659509 7f5820296700 1 ====== starting new request req=0x7f58202901f0 ===== 2018-01-11 20:15:41.661910 7f5806262700 1 ====== starting new request req=0x7f580625c1f0 ===== 2018-01-11 20:15:50.339676 7f5820a97700 1 ====== starting new request req=0x7f5820a911f0 ===== 2018-01-11 20:15:50.340447 7f5833abd700 1 ====== starting new request req=0x7f5833ab71f0 ===== 2018-01-11 20:15:51.661637 7f581b28c700 1 ====== starting new request req=0x7f581b2861f0 ===== 2018-01-11 20:15:51.665464 7f5824a9f700 1 ====== starting new request req=0x7f5824a991f0 ===== 2018-01-11 20:16:00.342250 7f581fa95700 1 ====== starting new request req=0x7f581fa8f1f0 ===== 2018-01-11 20:16:00.342296 7f580aa6b700 1 ====== starting new request req=0x7f580aa651f0 ===== 2018-01-11 20:16:01.663620 7f581ea93700 1 ====== starting new request req=0x7f581ea8d1f0 ===== 2018-01-11 20:16:01.668467 7f582a2aa700 1 ====== starting new request req=0x7f582a2a41f0 ===== 2018-01-11 20:16:10.344220 7f58302b6700 1 ====== starting new request req=0x7f58302b01f0 ===== 2018-01-11 20:16:10.345422 7f581ba8d700 1 ====== starting new request req=0x7f581ba871f0 ===== 2018-01-11 20:16:11.664968 7f582baad700 1 ====== starting new request req=0x7f582baa71f0 ===== 2018-01-11 20:16:11.671974 7f582dab1700 1 ====== starting new request req=0x7f582daab1f0 ===== 2018-01-11 20:16:20.345984 7f5810276700 1 ====== starting new request req=0x7f58102701f0 ===== 2018-01-11 20:16:20.346372 7f581f294700 1 ====== starting new request req=0x7f581f28e1f0 ===== 2018-01-11 20:16:21.667324 7f5819a89700 1 ====== starting new request req=0x7f5819a831f0 ===== 2018-01-11 20:16:21.675243 7f5823a9d700 1 ====== starting new request req=0x7f5823a971f0 ===== 2018-01-11 20:16:30.347943 7f58292a8700 1 ====== starting new request req=0x7f58292a21f0 ===== 2018-01-11 20:16:30.348865 7f581a28a700 1 ====== starting new request req=0x7f581a2841f0 ===== 2018-01-11 20:16:31.670269 7f580f274700 1 ====== starting new request req=0x7f580f26e1f0 ===== 2018-01-11 20:16:31.678598 7f5818286700 1 ====== starting new request req=0x7f58182801f0 ===== 2018-01-11 20:16:40.350418 7f58272a4700 1 ====== starting new request req=0x7f582729e1f0 ===== 2018-01-11 20:16:40.351565 7f582eab3700 1 ====== starting new request req=0x7f582eaad1f0 ===== 2018-01-11 20:16:41.671624 7f581e292700 1 ====== starting new request req=0x7f581e28c1f0 ===== 2018-01-11 20:16:41.682522 7f5819288700 1 ====== starting new request req=0x7f58192821f0 ===== 2018-01-11 20:16:50.352821 7f5817a85700 1 ====== starting new request req=0x7f5817a7f1f0 ===== 2018-01-11 20:16:50.357997 7f5806a63700 1 ====== starting new request req=0x7f5806a5d1f0 ===== 2018-01-11 20:16:51.674867 7f581227a700 1 ====== starting new request req=0x7f58122741f0 ===== 2018-01-11 20:16:51.685882 7f5811a79700 1 ====== starting new request req=0x7f5811a731f0 ===== 2018-01-11 20:17:00.356027 7f5812a7b700 1 ====== starting new request req=0x7f5812a751f0 ===== 2018-01-11 20:17:00.360732 7f581c28e700 1 ====== starting new request req=0x7f581c2881f0 ===== 2018-01-11 20:17:01.678524 7f5815280700 1 ====== starting new request req=0x7f581527a1f0 ===== 2018-01-11 20:17:01.689199 7f5816a83700 1 ====== starting new request req=0x7f5816a7d1f0 ===== 2018-01-11 20:17:10.358813 7f580fa75700 1 ====== starting new request req=0x7f580fa6f1f0 ===== 2018-01-11 20:17:10.363121 7f581da91700 1 ====== starting new request req=0x7f581da8b1f0 ===== 2018-01-11 20:17:11.682017 7f581427e700 1 ====== starting new request req=0x7f58142781f0 ===== 2018-01-11 20:17:11.693168 7f5811278700 1 ====== starting new request req=0x7f58112721f0 ===== 2018-01-11 20:17:20.366413 7f5809a69700 1 ====== starting new request req=0x7f5809a631f0 ===== 2018-01-11 20:17:20.366555 7f5821298700 1 ====== starting new request req=0x7f58212921f0 ===== 2018-01-11 20:17:21.684856 7f580ca6f700 1 ====== starting new request req=0x7f580ca691f0 ===== 2018-01-11 20:17:21.696645 7f5813a7d700 1 ====== starting new request req=0x7f5813a771f0 ===== 2018-01-11 20:17:30.366328 7f580a26a700 1 ====== starting new request req=0x7f580a2641f0 ===== 2018-01-11 20:17:30.366715 7f5826aa3700 1 ====== starting new request req=0x7f5826a9d1f0 ===== 2018-01-11 20:17:31.687722 7f5816282700 1 ====== starting new request req=0x7f581627c1f0 ===== 2018-01-11 20:17:31.700560 7f5809268700 1 ====== starting new request req=0x7f58092621f0 ===== 2018-01-11 20:17:40.369569 7f5835ac1700 1 ====== starting new request req=0x7f5835abb1f0 ===== 2018-01-11 20:17:40.369956 7f5808266700 1 ====== starting new request req=0x7f58082601f0 ===== 2018-01-11 20:17:41.689913 7f5836ac3700 1 ====== starting new request req=0x7f5836abd1f0 ===== 2018-01-11 22:17:14.888135 7f5838ac7700 -1 received signal: Terminated from PID: 1 task name: /sbin/init UID: 0 2018-01-11 22:17:14.888161 7f5838ac7700 1 handle_sigterm 2018-01-11 22:17:14.888198 7f5838ac7700 1 handle_sigterm set alarm for 120 2018-01-11 22:17:14.888209 7f58698ebe80 -1 shutting down 2018-01-11 22:18:45.116476 7f5987be2e80 0 deferred set uid:gid to 64045:64045 (ceph:ceph) 2018-01-11 22:18:45.116716 7f5987be2e80 0 ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process (unknown), pid 38132 2018-01-11 22:18:45.258934 7f5987be2e80 0 starting handler: civetweb 2018-01-11 22:18:45.266871 7f5987be2e80 1 mgrc service_daemon_register rgw.radosgw1 metadata {arch=x86_64,ceph_version=ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable),cpu= Intel(R) Xeon(R) CPU L5520 @ 2.27GHz,distro=ubuntu,distro_description=Ubuntu 16.04.3 LTS,distro_version=16.04,frontend_config#0=civetweb port=80,frontend_type#0=civetweb,hostname=ukradosgw1,kernel _description=#127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017,kernel_version=4.4.0-104-generic,mem_swap_kb=12580860,mem_total_kb=12286220,num_handles=1,os=Linux,pid=38132,zone_id=default,zone_name=default,zonegr oup_id=default,zonegroup_name=default} Its like the service stops responding… -Brent |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com