radosgw recovering shards

Frank R <frankaritchie@xxxxxxxxx> · Mon, 28 Oct 2019 11:07:47 -0400

Hi all,

Apologies for all the messages to the list over the past few days.
After an upgrade from 12.2.7 to 12.2.12 (inherited cluster) for an RGW multisite
active/active setup I am almost constantly seeing 1-10 "recovering shards" when running "radosgw-admin sync status", ie:

----------

# radosgw-admin sync status
          realm 8f7fd3fd-f72d-411d-b06b-7b4b579f5f2f (prod)
      zonegroup 60a2cb75-6978-46a3-b830-061c8be9dc75 (prod)
           zone 7fe96e52-d6f7-4ad6-b66e-ecbbbffbc18e (us-east-2)
  metadata sync no sync (zone is master)
      data sync source: ffce148e-3b24-462d-98bf-8c212de31de5 (us-east-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 1 shards
                        behind shards: [102]
                        8 shards are recovering
                        recovering shards: [31,37,56,60,83,92,95,127]

----------
Every once in a while it will go to full sync.

This is seen on both the master and secondary side.

There were a bunch of stale reshard instances (on both ends) that I was able
to remove with:

-----

radosgw-admin reshard stale-instances rm

-----

What exactly is a "recovering shard".

What can be performed to troubleshoot/fix this condition? I have verified that rgw_num_rados_handles is 1.

Additionally, what exactly do:

# radosgw-admin metadata sync init
# radosgw-admin data sync init 

do and do they tend to take a long time to perform.

thx
Frank
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx