Re: radosgw sync falling behind regularly

Matthew H <matthew.heler@xxxxxxxxxxx> · Thu, 28 Feb 2019 04:33:53 +0000

Hey Christian,

I'm making a while guess, but assuming this is 12.2.8. If so, it it possible that you can upgrade to 12.2.11? There's been rgw multisite bug fixes for metadata syncing and data syncing ( both separate issues ) that you could be hitting.

Thanks,

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Christian Rice <crice@xxxxxxxxxxx>

Sent: Wednesday, February 27, 2019 7:05 PM

To: ceph-users

Subject:  radosgw sync falling behind regularly

Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters in one zonegroup.

Often we find either metadata or data sync behind, and it doesn’t look to ever recover until…we restart the endpoint radosgw target service.

eg at 15:45:40:

dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

          realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

      zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

           zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

  metadata sync syncing

                full sync: 0/64 shards

                incremental sync: 64/64 shards

                metadata is behind on 2 shards

                behind shards: [19,41]

                oldest incremental change not applied: 2019-02-27 14:42:24.0.408263s

      data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)

                        syncing

                        full sync: 0/128 shards

                        incremental sync: 128/128 shards

                        data is caught up with source

                source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)

                        syncing

                        full sync: 0/128 shards

                        incremental sync: 128/128 shards

                        data is caught up with source

so at 15:46:07:

dc11-ceph-rgw1:/var/log/ceph# sudo systemctl restart
ceph-radosgw@rgw.dc11-ceph-rgw1.service

and by the time I checked at 15:48:08:

dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

          realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

      zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

           zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

  metadata sync syncing

                full sync: 0/64 shards

                incremental sync: 64/64 shards

                metadata is caught up with master

      data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)

                        syncing

                        full sync: 0/128 shards

                        incremental sync: 128/128 shards

                        data is caught up with source

                source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)

                        syncing

                        full sync: 0/128 shards

                        incremental sync: 128/128 shards

                        data is caught up with source

There’s no way this is “lag.”  It’s stuck, and happens frequently, though perhaps not daily.  Any suggestions?  Our cluster isn’t heavily used yet, but it’s production.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com