Multisite RGW - stucked metadata shards (metadata is behind on X shards)

"P. O." <posdub@xxxxxxxxx> · Wed, 11 Sep 2019 17:32:05 +0200

Hi all,

In my environment with replicated two (mimic 13.2.6) clusters I have problem with stucked metadata shards.

[Master root@rgw-1]$ radosgw-admin sync status 
          realm b144111d-8176-47e5-aa3a-85c65032e8a9 (realm)
      zonegroup 2ead77cb-f5c2-4d62-9959-12912828fb4b (1_zonegroup)
           zone f67241ae-8d0d-458d-91d7-f59054679c73 (1_zone)
  metadata sync no sync (zone is master)
      data sync source: db285b04-10d4-4dbc-9de0-299f3da7c083 (2_zone)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards

[Secondary root@rgw-1 ~]$ radosgw-admin sync status 
          realm b144111d-8176-47e5-aa3a-85c65032e8a9 (realm)
      zonegroup 2ead77cb-f5c2-4d62-9959-12912828fb4b (1_zonegroup)
           zone db285b04-10d4-4dbc-9de0-299f3da7c083 (2_zone)
  metadata sync syncing
    full sync: 0/64 shards
    incremental sync: 64/64 shards
    metadata is behind on 3 shards                                       <--------- Here
    behind shards: [12,22,42]                                            <--------- Here
    oldest incremental change not applied: 2019-09-10 08:40:55.0.327671s <--------- And here
      data sync source: f67241ae-8d0d-458d-91d7-f59054679c73 (dc1_zone)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards

On master zone i found the mdlogs like below:

For shard 12: there are mdlogs related with only one bucket.
[Master root@rgw-1 2]$ radosgw-admin mdlog list --shard-id=12 |grep id |wc -l
36
[Master root@rgw-1 2]$ radosgw-admin mdlog list --shard-id=12 |grep time | head -n 1
        "timestamp": "2019-09-10 06:47:51.919457421Z",
[Master root@rgw-1 2]$ radosgw-admin mdlog list --shard-id=12 |grep time | tail -n 1
        "timestamp": "2019-09-11 06:47:16.386356522Z",

For shard 22: There is mdlogs related with two buckets.
[Master root@rgw-1 2]$ radosgw-admin mdlog list --shard-id=22 |grep id |wc -l
26
[Master root@rgw-1 2]$ radosgw-admin mdlog list --shard-id=22 |grep time | head -n 1
        "timestamp": "2019-09-10 06:40:55.327671028Z",
[Master root@rgw-1 2]$ radosgw-admin mdlog list --shard-id=22 |grep time | tail -n 1
        "timestamp": "2019-09-10 10:19:04.812461761Z",

For shard 42: There is no mdlogs
[Master root@rgw-1 2]$ radosgw-admin mdlog list --shard-id=42
[]

Shards have been blocked for more than a day. Restarting the rgws did not help. 
How can I unlock these shards? Is safe to delete this mdlogs in master zone?

Any help/suggestions will be appreciated

Best regards,
PO

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com