Cloud sync to minio fails after creating the bucket

matze@xxxxxxxxxxxxx · Mon, 21 Nov 2022 10:04:53 +0000

Hi list,

I'm currently implementing a sync between ceph and a minio cluster to continously sync the buckets and objects to an offsite location. I followed the guide on https://croit.io/blog/setting-up-ceph-cloud-sync-module

After the sync starts it successfully creates the first bucket, but somehow tries over and over again to create the bucket instead of adding the objects itself. This is from the minio logs:

------------
2022-11-21T10:20:55.776 [200 OK] s3.PutBucket [2a02::...]:9000/rgw-default-61727a643fba391a [2a02::...]              1.592ms      ↑ 78 B ↓ 0 B
2022-11-21T10:20:55.778 [409 Conflict] s3.PutBucket [2a02::...]:9000/rgw-default-61727a643fba391a [2a02::...]              649µs       ↑ 78 B ↓ 386 B
repeats over and over again
------------

This is my cloud sync config:
------------
{
    "id": "7185f1a9-f33b-41d3-8906-634ac096d4a9",
    "name": "backup",
    "domain_root": "backup.rgw.meta:root",
    "control_pool": "backup.rgw.control",
    "gc_pool": "backup.rgw.log:gc",
    "lc_pool": "backup.rgw.log:lc",
    "log_pool": "backup.rgw.log",
    "intent_log_pool": "backup.rgw.log:intent",
    "usage_log_pool": "backup.rgw.log:usage",
    "roles_pool": "backup.rgw.meta:roles",
    "reshard_pool": "backup.rgw.log:reshard",
    "user_keys_pool": "backup.rgw.meta:users.keys",
    "user_email_pool": "backup.rgw.meta:users.email",
    "user_swift_pool": "backup.rgw.meta:users.swift",
    "user_uid_pool": "backup.rgw.meta:users.uid",
    "otp_pool": "backup.rgw.otp",
    "system_key": {
        "access_key": "<REDACTED>",
        "secret_key": "<REDACTED>"
    },
    "placement_pools": [
        {
            "key": "default-placement",
            "val": {
                "index_pool": "backup.rgw.buckets.index",
                "storage_classes": {
                    "STANDARD": {
                        "data_pool": "backup.rgw.buckets.data"
                    }
                },
                "data_extra_pool": "backup.rgw.buckets.non-ec",
                "index_type": 0
            }
        }
    ],
    "tier_config": {
        "connection": {
            "access_key": "<REDACTED>",
            "endpoint": "http://[<REDACTED>]:9000",
            "secret": "<REDACTED>"
        }
    },
    "realm_id": "d1e9f0cd-c965-44c6-a4bd-b7704cab9c4e",
    "notif_pool": "backup.rgw.log:notif"
}

------------

This is the sync status:
------------
# radosgw-admin sync status --rgw-zone=backup
          realm d1e9f0cd-c965-44c6-a4bd-b7704cab9c4e (defaultrealm)
      zonegroup cee2848e-368f-45d3-8310-caab37b022a7 (default)
           zone 7185f1a9-f33b-41d3-8906-634ac096d4a9 (backup)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: a14cce61-8951-438f-89f6-4e65637e2941 (default)
                        syncing
                        full sync: 128/128 shards
                        full sync: 3404 buckets to sync
                        incremental sync: 0/128 shards
                        data is behind on 128 shards
                        behind shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]
                        1 shards are recovering
                        recovering shards: [18]
------------
This is the output on the receiving RGW

Nov 21 10:52:15 <hostname> radosgw[510538]: rgw rados thread: AWS: creating bucket rgw-default-61727a643fba391a
Nov 21 10:52:15 <hostname> radosgw[510538]: rgw rados thread: AWS: download begin: z=a14cce61-8951-438f-89f6-4e65637e2941 b=:<REDACTED>[a14cce61-8951-438f-89f6-4e65637e2941.28999987.12]) k=0002cb99-3faa-42e1-a760-364c4ffba982 size=23537 mtime=2022-04-03T14:48:06.449064+0000 etag=0325bf0901634ce13405bea67767b8f4 zone_short_id=0 pg_ver=744888
Nov 21 10:52:15 <hostname> radosgw[510538]: rgw rados thread: AWS: creating bucket rgw-default-61727a643fba391a
Nov 21 10:52:15 <hostname> radosgw[510538]: failed to wait for op, ret=-39: PUT http://[<REDACTED>]:9000/rgw-default-61727a643fba391a
------------

After some retries the receiving RGW crashes with the following message
------------
Nov 21 10:52:15 prod-backup-201.ceph.plusline.net radosgw[510538]: *** Caught signal (Segmentation fault) **
                                                                    in thread 7f8cff660700 thread_name:data-sync

                                                                    ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable)
                                                                    1: /lib64/libpthread.so.0(+0x12ce0) [0x7f8d29e6ece0]
                                                                    2: /lib64/libc.so.6(+0xcfd02) [0x7f8d28540d02]
                                                                    3: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x95) [0x7f8d2cae2ef5]
                                                                    4: (RGWAWSStreamObjToCloudPlainCR::operate(DoutPrefixProvider const*)+0x255) [0x7f8d2ce581f5]
                                                                    5: (RGWCoroutinesStack::operate(DoutPrefixProvider const*, RGWCoroutinesEnv*)+0x15c) [0x7f8d2cefe03c]
                                                                    6: (RGWCoroutinesManager::run(DoutPrefixProvider const*, std::__cxx11::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x296) [0x7f8d2cefee66]
                                                                    7: (RGWCoroutinesManager::run(DoutPrefixProvider const*, RGWCoroutine*)+0x91) [0x7f8d2cf00131]
                                                                    8: (RGWRemoteDataLog::run_sync(DoutPrefixProvider const*, int)+0x1b4) [0x7f8d2cdef104]
                                                                    9: (RGWDataSyncProcessorThread::process(DoutPrefixProvider const*)+0x59) [0x7f8d2cfcfde9]
                                                                    10: (RGWRadosThread::Worker::entry()+0x13a) [0x7f8d2cf8e30a]
                                                                    11: /lib64/libpthread.so.0(+0x81ca) [0x7f8d29e641ca]
                                                                    12: clone()
                                                                    NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
------------

I can also see that the objects are successfully being requested on the originating RGW servers. Any idea of the root cause would be welcome.

Best regards,
Matthias
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx