Re: Multi-site replication speed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Casey,

I set up a completely fresh cluster on a new VM host.. everything is fresh fresh fresh. I feel like it installed cleanly and because there is practically zero latency and unlimited bandwidth as peer VMs, this is a better place to experiment. The behavior is the same as the other cluster.

The realm is “example-test”, has a single zone group named “us”, and there are zones “left” and “right”. The master zone is “left” and I am trying to unidirectionally replicate to “right”. “left” is a two node cluster and right is a single node cluster. Both show "too few PGs per OSD” but are otherwise 100% active+clean. Both clusters have been completely restarted to make sure there are no latent config issues, although only the RGW nodes should require that. 

The thread at [1] is the most involved engagement I’ve found with a staff member on the subject, so I checked and believe I attached all the logs that were requested there. They all appear to be consistent and are attached below.

For start: 
[root@right01 ~]# radosgw-admin sync status
          realm d5078dd2-6a6e-49f8-941e-55c02ad58af7 (example-test)
      zonegroup de533461-2593-45d2-8975-99072d860bb2 (us)
           zone 5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe (right)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 479d3f20-d57d-4b37-995b-510ba10756bf (left)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

I tried the information at [2] and do not see any ops in progress, just “linger_ops”. I don’t know what those are, but probably explain the slow stream of requests back and forth between the two RGW endpoints:
[root@right01 ~]# ceph daemon client.rgw.right01.54395.94074682941968 objecter_requests
{
    "ops": [],
    "linger_ops": [
        {
            "linger_id": 2,
            "pg": "2.16dafda0",
            "osd": 0,
            "object_id": "notify.1",
            "object_locator": "@2",
            "target_object_id": "notify.1",
            "target_object_locator": "@2",
            "paused": 0,
            "used_replica": 0,
            "precalc_pgid": 0,
            "snapid": "head",
            "registered": "1"
        },
        ...
    ],
    "pool_ops": [],
    "pool_stat_ops": [],
    "statfs_ops": [],
    "command_ops": []
}


The next thing I tried is `radosgw-admin data sync run --source-zone=left` from the right side. I get bursts of messages of the following form:
2019-04-19 21:46:34.281 7f1c006ad580  0 RGW-SYNC:data:sync:shard[1]: ERROR: failed to read remote data log info: ret=-2
2019-04-19 21:46:34.281 7f1c006ad580  0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2

When I sorted and filtered the messages, each burst has one RGW-SYNC message for each of the PGs on the left side identified by the number in “[]”. Since left has 128 PGs, these are the numbers between 0-127. The bursts happen about once every five seconds.

The packet traces between the nodes during the `data sync run` are mostly requests and responses of the following form:
HTTP GET: http://right01.example.com:7480/admin/log/?type=data&id=7&marker&extra-info=true&rgwx-zonegroup=de533461-2593-45d2-8975-99072d860bb2
HTTP 404 RESPONSE: {"Code":"NoSuchKey","RequestId":"tx000000000000000002a01-005cba9593-371d-right","HostId":"371d-right-us”}

When I stop the `data sync run`, these 404s stop, so clearly the `data sync run` isn’t changing a state in the rgw, but doing something synchronously. In the past, I have done a `data sync init` but it doesn’t seem like doing it repeatedly will make a difference so I didn’t do it any more.

NEXT STEPS:

I am working on how to get better logging output from daemons and hope to find something in there that will help. If I am lucky, I will find something in there and can report back so this thread is useful for others. If I have not written back, I probably haven’t found anything, so would be grateful for any leads.

Kind regards and thank you!

Brian


CONFIG DUMPS:

[root@left01 ~]# radosgw-admin period get-current
{
    "current_period": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c"
}
[root@left01 ~]# radosgw-admin period get cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c
{
    "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c",
    "epoch": 6,
    "predecessor_uuid": "1f87151a-a1e4-469b-9f90-c309d7b64d80",
    "sync_status": [],
    "period_map": {
        "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c",
        "zonegroups": [
            {
                "id": "de533461-2593-45d2-8975-99072d860bb2",
                "name": "us",
                "api_name": "us",
                "is_master": "true",
                "endpoints": [
                    "http://left01.example.com:7480"
                ],
                "hostnames": [],
                "hostnames_s3website": [],
                "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf",
                "zones": [
                    {
                        "id": "479d3f20-d57d-4b37-995b-510ba10756bf",
                        "name": "left",
                        "endpoints": [
                            "http://left01.example.com:7480"
                        ],
                        "log_meta": "false",
                        "log_data": "true",
                        "bucket_index_max_shards": 0,
                        "read_only": "false",
                        "tier_type": "",
                        "sync_from_all": "true",
                        "sync_from": [],
                        "redirect_zone": ""
                    },
                    {
                        "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe",
                        "name": "right",
                        "endpoints": [
                            "http://right01.example.com:7480"
                        ],
                        "log_meta": "false",
                        "log_data": "true",
                        "bucket_index_max_shards": 0,
                        "read_only": "false",
                        "tier_type": "",
                        "sync_from_all": "true",
                        "sync_from": [],
                        "redirect_zone": ""
                    }
                ],
                "placement_targets": [
                    {
                        "name": "default-placement",
                        "tags": [],
                        "storage_classes": [
                            "STANDARD"
                        ]
                    }
                ],
                "default_placement": "default-placement",
                "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7"
            }
        ],
        "short_zone_ids": [
            {
                "key": "479d3f20-d57d-4b37-995b-510ba10756bf",
                "val": 1817029288
            },
            {
                "key": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe",
                "val": 1573215025
            }
        ]
    },
    "master_zonegroup": "de533461-2593-45d2-8975-99072d860bb2",
    "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf",
    "period_config": {
        "bucket_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        },
        "user_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        }
    },
    "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7",
    "realm_name": “example-test",
    "realm_epoch": 2
}
[root@left01 ~]# radosgw-admin zonegroup get
{
    "id": "de533461-2593-45d2-8975-99072d860bb2",
    "name": "us",
    "api_name": "us",
    "is_master": "true",
    "endpoints": [
    ],
    "hostnames": [],
    "hostnames_s3website": [],
    "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf",
    "zones": [
        {
            "id": "479d3f20-d57d-4b37-995b-510ba10756bf",
            "name": "left",
            "endpoints": [
                "http://left01.example.com:7480"
            ],
            "log_meta": "false",
            "log_data": "true",
            "bucket_index_max_shards": 0,
            "read_only": "false",
            "tier_type": "",
            "sync_from_all": "true",
            "sync_from": [],
            "redirect_zone": ""
        },
        {
            "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe",
            "name": "right",
            "endpoints": [
                "http://right01.example.com:7480"
            ],
            "log_meta": "false",
            "log_data": "true",
            "bucket_index_max_shards": 0,
            "read_only": "false",
            "tier_type": "",
            "sync_from_all": "true",
            "sync_from": [],
            "redirect_zone": ""
        }
    ],
    "placement_targets": [
        {
            "name": "default-placement",
            "tags": [],
            "storage_classes": [
                "STANDARD"
            ]
        }
    ],
    "default_placement": "default-placement",
    "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7"
}
[root@left01 ~]# radosgw-admin period get
{
    "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c",
    "epoch": 6,
    "predecessor_uuid": "1f87151a-a1e4-469b-9f90-c309d7b64d80",
    "sync_status": [],
    "period_map": {
        "id": "cdc3d603-2bc8-493b-ba6a-c6a51c49cc0c",
        "zonegroups": [
            {
                "id": "de533461-2593-45d2-8975-99072d860bb2",
                "name": "us",
                "api_name": "us",
                "is_master": "true",
                "endpoints": [
                    "http://left01.example.com:7480"
                ],
                "hostnames": [],
                "hostnames_s3website": [],
                "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf",
                "zones": [
                    {
                        "id": "479d3f20-d57d-4b37-995b-510ba10756bf",
                        "name": "left",
                        "endpoints": [
                            "http://left01.example.com:7480"
                        ],
                        "log_meta": "false",
                        "log_data": "true",
                        "bucket_index_max_shards": 0,
                        "read_only": "false",
                        "tier_type": "",
                        "sync_from_all": "true",
                        "sync_from": [],
                        "redirect_zone": ""
                    },
                    {
                        "id": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe",
                        "name": "right",
                        "endpoints": [
                            "http://right01.example.com:7480"
                        ],
                        "log_meta": "false",
                        "log_data": "true",
                        "bucket_index_max_shards": 0,
                        "read_only": "false",
                        "tier_type": "",
                        "sync_from_all": "true",
                        "sync_from": [],
                        "redirect_zone": ""
                    }
                ],
                "placement_targets": [
                    {
                        "name": "default-placement",
                        "tags": [],
                        "storage_classes": [
                            "STANDARD"
                        ]
                    }
                ],
                "default_placement": "default-placement",
                "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7"
            }
        ],
        "short_zone_ids": [
            {
                "key": "479d3f20-d57d-4b37-995b-510ba10756bf",
                "val": 1817029288
            },
            {
                "key": "5dc80bbc-3d9d-46d5-8f3e-4611fbc17fbe",
                "val": 1573215025
            }
        ]
    },
    "master_zonegroup": "de533461-2593-45d2-8975-99072d860bb2",
    "master_zone": "479d3f20-d57d-4b37-995b-510ba10756bf",
    "period_config": {
        "bucket_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        },
        "user_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        }
    },
    "realm_id": "d5078dd2-6a6e-49f8-941e-55c02ad58af7",
    "realm_name": “example-test",
    "realm_epoch": 2
}

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux