Hi Casey,
I set up a completely fresh cluster on a new VM host.. everything is fresh fresh fresh. I feel like it installed cleanly and because there is practically zero latency and unlimited bandwidth as peer VMs, this is a better place to experiment. The behavior is the same as the other cluster.
The realm is “example-test”, has a single zone group named “us”, and there are zones “left” and “right”. The master zone is “left” and I am trying to unidirectionally replicate to “right”. “left” is a two node cluster and right is a single node cluster. Both show "too few PGs per OSD” but are otherwise 100% active+clean. Both clusters have been completely restarted to make sure there are no latent config issues, although only the RGW nodes should require that.
The thread at [1] is the most involved engagement I’ve found with a staff member on the subject, so I checked and believe I attached all the logs that were requested there. They all appear to be consistent and are attached below.
For start: [root@right01 ~]# radosgw-admin sync status realm d5078dd2-6a6e-49f8-941e-55c02ad58af7 (example-test) zonegroup de533461-2593-45d2-8975-99072d860bb2 (us) zone 5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe (right) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: 479d3f20-d57d-4b37-995b-510ba10756bf (left) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source
I tried the information at [2] and do not see any ops in progress, just “linger_ops”. I don’t know what those are, but probably explain the slow stream of requests back and forth between the two RGW endpoints: [root@right01 ~]# ceph daemon client.rgw.right01.54395.94074682941968 objecter_requests { "ops": [], "linger_ops": [ { "linger_id": 2, "pg": "2.16dafda0", "osd": 0, "object_id": "notify.1", "object_locator": "@2", "target_object_id": "notify.1", "target_object_locator": "@2", "paused": 0, "used_replica": 0, "precalc_pgid": 0, "snapid": "head", "registered": "1" },
...
], "pool_ops": [], "pool_stat_ops": [], "statfs_ops": [], "command_ops": [] }
The next thing I tried is `radosgw-admin data sync run --source-zone=left` from the right side. I get bursts of messages of the following form: 2019-04-19 21:46:34.281 7f1c006ad580 0 RGW-SYNC:data:sync:shard[1]: ERROR: failed to read remote data log info: ret=-2 2019-04-19 21:46:34.281 7f1c006ad580 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2
When I sorted and filtered the messages, each burst has one RGW-SYNC message for each of the PGs on the left side identified by the number in “[]”. Since left has 128 PGs, these are the numbers between 0-127. The bursts happen about once every five seconds.
The packet traces between the nodes during the `data sync run` are mostly requests and responses of the following form: When I stop the `data sync run`, these 404s stop, so clearly the `data sync run` isn’t changing a state in the rgw, but doing something synchronously. In the past, I have done a `data sync init` but it doesn’t seem like doing it repeatedly will make a difference so I didn’t do it any more.
NEXT STEPS:
I am working on how to get better logging output from daemons and hope to find something in there that will help. If I am lucky, I will find something in there and can report back so this thread is useful for others. If I have not written back, I probably haven’t found anything, so would be grateful for any leads.
Kind regards and thank you!
Brian
CONFIG DUMPS:
[root@left01 ~]# radosgw-admin period get-current { "current_period": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c" } [root@left01 ~]# radosgw-admin period get cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c { "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c", "epoch": 6, "predecessor_uuid": "1f87151a-a1e4-469b-9f90-c309d7b64d80", "sync_status": [], "period_map": { "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c", "zonegroups": [ { "id": "de533461-2593-45d2-8975-99072d860bb2", "name": "us", "api_name": "us", "is_master": "true", "endpoints": [ ], "hostnames": [], "hostnames_s3website": [], "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf", "zones": [ { "id": "479d3f20-d57d-4b37-995b-510ba10756bf", "name": "left", "endpoints": [ ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" }, { "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe", "name": "right", "endpoints": [ ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" } ], "placement_targets": [ { "name": "default-placement", "tags": [], "storage_classes": [ "STANDARD" ] } ], "default_placement": "default-placement", "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7" } ], "short_zone_ids": [ { "key": "479d3f20-d57d-4b37-995b-510ba10756bf", "val": 1817029288 }, { "key": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe", "val": 1573215025 } ] }, "master_zonegroup": "de533461-2593-45d2-8975-99072d860bb2", "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf", "period_config": { "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } }, "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7", "realm_name": “example-test", "realm_epoch": 2 } [root@left01 ~]# radosgw-admin zonegroup get { "id": "de533461-2593-45d2-8975-99072d860bb2", "name": "us", "api_name": "us", "is_master": "true", "endpoints": [ ], "hostnames": [], "hostnames_s3website": [], "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf", "zones": [ { "id": "479d3f20-d57d-4b37-995b-510ba10756bf", "name": "left", "endpoints": [ ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" }, { "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe", "name": "right", "endpoints": [ ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" } ], "placement_targets": [ { "name": "default-placement", "tags": [], "storage_classes": [ "STANDARD" ] } ], "default_placement": "default-placement", "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7" } [root@left01 ~]# radosgw-admin period get { "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c", "epoch": 6, "predecessor_uuid": "1f87151a-a1e4-469b-9f90-c309d7b64d80", "sync_status": [], "period_map": { "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c", "zonegroups": [ { "id": "de533461-2593-45d2-8975-99072d860bb2", "name": "us", "api_name": "us", "is_master": "true", "endpoints": [ ], "hostnames": [], "hostnames_s3website": [], "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf", "zones": [ { "id": "479d3f20-d57d-4b37-995b-510ba10756bf", "name": "left", "endpoints": [ ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" }, { "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe", "name": "right", "endpoints": [ ], "log_meta": "false", "log_data": "true", "bucket_index_max_shards": 0, "read_only": "false", "tier_type": "", "sync_from_all": "true", "sync_from": [], "redirect_zone": "" } ], "placement_targets": [ { "name": "default-placement", "tags": [], "storage_classes": [ "STANDARD" ] } ], "default_placement": "default-placement", "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7" } ], "short_zone_ids": [ { "key": "479d3f20-d57d-4b37-995b-510ba10756bf", "val": 1817029288 }, { "key": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe", "val": 1573215025 } ] }, "master_zonegroup": "de533461-2593-45d2-8975-99072d860bb2", "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf", "period_config": { "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } }, "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7", "realm_name": “example-test", "realm_epoch": 2 }
|