Re: Multisite RGW - endpoints configuration

Peter Eisch <peter.eisch@xxxxxxxxxxxxxxx> · Wed, 17 Jul 2019 17:06:33 +0000

Hi,

I also have been looking solutions for improving sync.  I have two clusters, 25 ms RTT, with the RGW multi-site configured and all nodes running 12.2.12.  I have three rgw nodes at each with the nodes behind haproxy at each site.  There is a 1G circuit between the sites and bandwidth usage averages 370Mb/s.  I can put [with swift] to the remote site at wire speed.

Logs on the receiving site show ample:
heartbeat_map is_healthy 'RGWAsyncRadosProcessor::m_tp thread 0x7f16e022d700' had timed out after 600

..but it all works albeit slow.  What should be my next move in researching a resolution for this?

peter

Peter Eisch
Senior Site Reliability Engineer
T 1.612.659.3228
virginpulse.com
| virginpulse.com/global-challenge
Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland | United Kingdom | USA
Confidentiality Notice: The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible for delivering such messages to the intended recipient, is strictly prohibited and may be unlawful. This e-mail may contain proprietary, confidential or privileged information. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Virgin Pulse, Inc. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this e-mail message.
v2.59
On 7/17/19, 8:44 AM, "ceph-users on behalf of Casey Bodley" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of cbodley@xxxxxxxxxx> wrote: 

    On 7/17/19 8:04 AM, P. O. wrote:
    > Hi,
    > Is there any mechanism inside the rgw that can detect faulty endpoints
    > for a configuration with multiple endpoints?

    No, replication requests that fail just get retried using round robin
    until they succeed. If an endpoint isn't available, we assume it will
    come back eventually and keep trying.

    > Is there any advantage related with the number of replication
    > endpoints? Can I expect improved replication performance (the more
    > synchronization rgws = the faster replication)?

    These endpoints act as the server side of replication, and handle GET
    requests from other zones to read replication logs and fetch objects. As
    long as the number of gateways on the client side of replication (ie.
    gateways on other zones that have rgw_run_sync_thread enabled, which is
    on by default) scale along with these replication endpoints, you can
    expect a modest improvement in replication, though it's limited by the
    available bandwidth between sites. Spreading replication endpoints over
    several gateways also helps to limit the impact of replication on the
    local client workloads.

    >
    >
    > W dniu środa, 17 lipca 2019 P. O. <posdub@xxxxxxxxx
    > <mailto:posdub@xxxxxxxxx>> napisał(a):
    >
    >     Hi,
    >
    >     Is there any mechanism inside the rgw that can detect faulty
    >     endpoints for a configuration with multiple endpoints? Is there
    >     any advantage related with the number of replication endpoints?
    >     Can I expect improved replication performance (the more synchronization rgws = the faster replication)?
    >
    >
    >     W dniu wtorek, 16 lipca 2019 Casey Bodley <cbodley@xxxxxxxxxx
    >     <mailto:cbodley@xxxxxxxxxx>> napisał(a):
    >
    >         We used to have issues when a load balancer was in front of
    >         the sync endpoints, because our http client didn't time out
    >         stalled connections. Those are resolved in luminous, but we
    >         still recommend using the radosgw addresses directly to avoid
    >         shoveling data through an extra proxy. Internally, sync is
    >         already doing a round robin over that list of endpoints. On
    >         the other hand, load balancers give you some extra
    >         flexibility, like adding/removing gateways without having to
    >         update the global multisite configuration.
    >
    >         On 7/16/19 2:52 PM, P. O. wrote:
    >
    >             Hi all,
    >
    >             I have multisite RGW setup with one zonegroup and two
    >             zones. Each zone has one endpoint configured like below:
    >
    >             "zonegroups": [
    >             {
    >              ...
    >              "is_master": "true",
    >              "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" />    >              "zones": [
    >                {
    >                  "name": "primary_1",
    >                  "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" />    >                },
    >                {
    >                  "name": "secondary_1",
    >                  "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" />    >                }
    >              ],
    >
    >             My question is what is the best practice with configuring
    >             synchronization endpoints?
    >
    >             1) Should endpoints be behind load balancer? For example
    >             two synchronization endpoints per zone, and only load
    >             balancers address in "endpoints" section?
    >             2) Should endpoints be behind Round-robin DNS?
    >             3) Can I set RGWs addresses directly in endpoints section?
    >             For example:
    >
    >              "zones": [
    >                {
    >                  "name": "primary_1",
    >                  "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" />    >             https://nam02.safelinks.protection.outlook.com/?url="" />    >                },
    >                {
    >                  "name": "secondary_1",
    >                  "endpoints": ["https://nam02.safelinks.protection.outlook.com/?url="" />    >             https://nam02.safelinks.protection.outlook.com/?url="" />    >                }
    >
    >             Is there any advantages of third option? I mean speed up
    >             of synchronization, for example.
    >
    >             What recommendations do you have with the configuration of
    >             the endpoints in prod environments?
    >
    >             Best regards,
    >             Dun F.
    >
    >             _______________________________________________
    >             ceph-users mailing list
    >             ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    >             https://nam02.safelinks.protection.outlook.com/?url="" />    >
    >         _______________________________________________
    >         ceph-users mailing list
    >         ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    >         https://nam02.safelinks.protection.outlook.com/?url="" />    >
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx
    https://nam02.safelinks.protection.outlook.com/?url="" />    

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com