Hi,
I also have been looking solutions for improving sync. I have two clusters, 25 ms RTT, with the RGW multi-site configured and all nodes running 12.2.12. I have three rgw nodes at each with the nodes behind haproxy at each site. There is a 1G circuit between the sites and bandwidth usage averages 370Mb/s. I can put [with swift] to the remote site at wire speed.
Logs on the receiving site show ample:
heartbeat_map is_healthy 'RGWAsyncRadosProcessor::m_tp thread 0x7f16e022d700' had timed out after 600
..but it all works albeit slow. What should be my next move in researching a resolution for this?
peter
On 7/17/19, 8:44 AM, "ceph-users on behalf of Casey Bodley" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of cbodley@xxxxxxxxxx> wrote:
On 7/17/19 8:04 AM, P. O. wrote:
> Hi,
> Is there any mechanism inside the rgw that can detect faulty endpoints
> for a configuration with multiple endpoints?
No, replication requests that fail just get retried using round robin
until they succeed. If an endpoint isn't available, we assume it will
come back eventually and keep trying.
> Is there any advantage related with the number of replication
> endpoints? Can I expect improved replication performance (the more
> synchronization rgws = the faster replication)?
These endpoints act as the server side of replication, and handle GET
requests from other zones to read replication logs and fetch objects. As
long as the number of gateways on the client side of replication (ie.
gateways on other zones that have rgw_run_sync_thread enabled, which is
on by default) scale along with these replication endpoints, you can
expect a modest improvement in replication, though it's limited by the
available bandwidth between sites. Spreading replication endpoints over
several gateways also helps to limit the impact of replication on the
local client workloads.
>
>
> W dniu środa, 17 lipca 2019 P. O. <posdub@xxxxxxxxx
> <mailto:posdub@xxxxxxxxx>> napisał(a):
>
> Hi,
>
> Is there any mechanism inside the rgw that can detect faulty
> endpoints for a configuration with multiple endpoints? Is there
> any advantage related with the number of replication endpoints?
> Can I expect improved replication performance (the more synchronization rgws = the faster replication)?
>
>
> W dniu wtorek, 16 lipca 2019 Casey Bodley <cbodley@xxxxxxxxxx
> <mailto:cbodley@xxxxxxxxxx>> napisał(a):
>
> We used to have issues when a load balancer was in front of
> the sync endpoints, because our http client didn't time out
> stalled connections. Those are resolved in luminous, but we
> still recommend using the radosgw addresses directly to avoid
> shoveling data through an extra proxy. Internally, sync is
> already doing a round robin over that list of endpoints. On
> the other hand, load balancers give you some extra
> flexibility, like adding/removing gateways without having to
> update the global multisite configuration.
>
> On 7/16/19 2:52 PM, P. O. wrote:
>
> Hi all,
>
> I have multisite RGW setup with one zonegroup and two
> zones. Each zone has one endpoint configured like below:
>
> "zonegroups": [
> {
> ...
> "is_master": "true",
> "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" /> > "zones": [
> {
> "name": "primary_1",
> "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" /> > },
> {
> "name": "secondary_1",
> "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" /> > }
> ],
>
> My question is what is the best practice with configuring
> synchronization endpoints?
>
> 1) Should endpoints be behind load balancer? For example
> two synchronization endpoints per zone, and only load
> balancers address in "endpoints" section?
> 2) Should endpoints be behind Round-robin DNS?
> 3) Can I set RGWs addresses directly in endpoints section?
> For example:
>
> "zones": [
> {
> "name": "primary_1",
> "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" /> > https://nam02.safelinks.protection.outlook.com/?url="" /> > },
> {
> "name": "secondary_1",
> "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" /> > https://nam02.safelinks.protection.outlook.com/?url="" /> > }
>
> Is there any advantages of third option? I mean speed up
> of synchronization, for example.
>
> What recommendations do you have with the configuration of
> the endpoints in prod environments?
>
> Best regards,
> Dun F.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> https://nam02.safelinks.protection.outlook.com/?url="" /> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> https://nam02.safelinks.protection.outlook.com/?url="" /> >
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
https://nam02.safelinks.protection.outlook.com/?url="" />
I also have been looking solutions for improving sync. I have two clusters, 25 ms RTT, with the RGW multi-site configured and all nodes running 12.2.12. I have three rgw nodes at each with the nodes behind haproxy at each site. There is a 1G circuit between the sites and bandwidth usage averages 370Mb/s. I can put [with swift] to the remote site at wire speed.
Logs on the receiving site show ample:
heartbeat_map is_healthy 'RGWAsyncRadosProcessor::m_tp thread 0x7f16e022d700' had timed out after 600
..but it all works albeit slow. What should be my next move in researching a resolution for this?
peter
| |||||||
| |||||||
| |||||||
| |||||||
| |||||||
|
On 7/17/19 8:04 AM, P. O. wrote:
> Hi,
> Is there any mechanism inside the rgw that can detect faulty endpoints
> for a configuration with multiple endpoints?
No, replication requests that fail just get retried using round robin
until they succeed. If an endpoint isn't available, we assume it will
come back eventually and keep trying.
> Is there any advantage related with the number of replication
> endpoints? Can I expect improved replication performance (the more
> synchronization rgws = the faster replication)?
These endpoints act as the server side of replication, and handle GET
requests from other zones to read replication logs and fetch objects. As
long as the number of gateways on the client side of replication (ie.
gateways on other zones that have rgw_run_sync_thread enabled, which is
on by default) scale along with these replication endpoints, you can
expect a modest improvement in replication, though it's limited by the
available bandwidth between sites. Spreading replication endpoints over
several gateways also helps to limit the impact of replication on the
local client workloads.
>
>
> W dniu środa, 17 lipca 2019 P. O. <posdub@xxxxxxxxx
> <mailto:posdub@xxxxxxxxx>> napisał(a):
>
> Hi,
>
> Is there any mechanism inside the rgw that can detect faulty
> endpoints for a configuration with multiple endpoints? Is there
> any advantage related with the number of replication endpoints?
> Can I expect improved replication performance (the more synchronization rgws = the faster replication)?
>
>
> W dniu wtorek, 16 lipca 2019 Casey Bodley <cbodley@xxxxxxxxxx
> <mailto:cbodley@xxxxxxxxxx>> napisał(a):
>
> We used to have issues when a load balancer was in front of
> the sync endpoints, because our http client didn't time out
> stalled connections. Those are resolved in luminous, but we
> still recommend using the radosgw addresses directly to avoid
> shoveling data through an extra proxy. Internally, sync is
> already doing a round robin over that list of endpoints. On
> the other hand, load balancers give you some extra
> flexibility, like adding/removing gateways without having to
> update the global multisite configuration.
>
> On 7/16/19 2:52 PM, P. O. wrote:
>
> Hi all,
>
> I have multisite RGW setup with one zonegroup and two
> zones. Each zone has one endpoint configured like below:
>
> "zonegroups": [
> {
> ...
> "is_master": "true",
> "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" /> > "zones": [
> {
> "name": "primary_1",
> "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" /> > },
> {
> "name": "secondary_1",
> "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" /> > }
> ],
>
> My question is what is the best practice with configuring
> synchronization endpoints?
>
> 1) Should endpoints be behind load balancer? For example
> two synchronization endpoints per zone, and only load
> balancers address in "endpoints" section?
> 2) Should endpoints be behind Round-robin DNS?
> 3) Can I set RGWs addresses directly in endpoints section?
> For example:
>
> "zones": [
> {
> "name": "primary_1",
> "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" /> > https://nam02.safelinks.protection.outlook.com/?url="" /> > },
> {
> "name": "secondary_1",
> "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" /> > https://nam02.safelinks.protection.outlook.com/?url="" /> > }
>
> Is there any advantages of third option? I mean speed up
> of synchronization, for example.
>
> What recommendations do you have with the configuration of
> the endpoints in prod environments?
>
> Best regards,
> Dun F.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> https://nam02.safelinks.protection.outlook.com/?url="" /> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> https://nam02.safelinks.protection.outlook.com/?url="" /> >
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
https://nam02.safelinks.protection.outlook.com/?url="" />
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com