Hi all,
Apologies for all the messages to the list over the past few days.
After an upgrade from 12.2.7 to 12.2.12 (inherited cluster) for an RGW multisite
active/active setup I am almost constantly seeing 1-10 "recovering shards" when running "radosgw-admin sync status", ie:
----------
# radosgw-admin sync status
realm 8f7fd3fd-f72d-411d-b06b-7b4b579f5f2f (prod)
zonegroup 60a2cb75-6978-46a3-b830-061c8be9dc75 (prod)
zone 7fe96e52-d6f7-4ad6-b66e-ecbbbffbc18e (us-east-2)
metadata sync no sync (zone is master)
data sync source: ffce148e-3b24-462d-98bf-8c212de31de5 (us-east-1)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 1 shards
behind shards: [102]
8 shards are recovering
recovering shards: [31,37,56,60,83,92,95,127]
----------
Additionally, what exactly do:
Apologies for all the messages to the list over the past few days.
After an upgrade from 12.2.7 to 12.2.12 (inherited cluster) for an RGW multisite
active/active setup I am almost constantly seeing 1-10 "recovering shards" when running "radosgw-admin sync status", ie:
----------
# radosgw-admin sync status
realm 8f7fd3fd-f72d-411d-b06b-7b4b579f5f2f (prod)
zonegroup 60a2cb75-6978-46a3-b830-061c8be9dc75 (prod)
zone 7fe96e52-d6f7-4ad6-b66e-ecbbbffbc18e (us-east-2)
metadata sync no sync (zone is master)
data sync source: ffce148e-3b24-462d-98bf-8c212de31de5 (us-east-1)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 1 shards
behind shards: [102]
8 shards are recovering
recovering shards: [31,37,56,60,83,92,95,127]
----------
Every once in a while it will go to full sync.
This is seen on both the master and secondary side.
There were a bunch of stale reshard instances (on both ends) that I was able
to remove with:
-----
radosgw-admin reshard stale-instances rm
-----
What exactly is a "recovering shard".
What can be performed to troubleshoot/fix this condition? I have verified that rgw_num_rados_handles is 1.
This is seen on both the master and secondary side.
There were a bunch of stale reshard instances (on both ends) that I was able
to remove with:
-----
radosgw-admin reshard stale-instances rm
-----
What exactly is a "recovering shard".
What can be performed to troubleshoot/fix this condition? I have verified that rgw_num_rados_handles is 1.
Additionally, what exactly do:
# radosgw-admin metadata sync init
# radosgw-admin data sync init
do and do they tend to take a long time to perform.
thx
Frank
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx