Hi list, I'm currently implementing a sync between ceph and a minio cluster to continously sync the buckets and objects to an offsite location. I followed the guide on https://croit.io/blog/setting-up-ceph-cloud-sync-module After the sync starts it successfully creates the first bucket, but somehow tries over and over again to create the bucket instead of adding the objects itself. This is from the minio logs: ------------ 2022-11-21T10:20:55.776 [200 OK] s3.PutBucket [2a02::...]:9000/rgw-default-61727a643fba391a [2a02::...] 1.592ms ↑ 78 B ↓ 0 B 2022-11-21T10:20:55.778 [409 Conflict] s3.PutBucket [2a02::...]:9000/rgw-default-61727a643fba391a [2a02::...] 649µs ↑ 78 B ↓ 386 B repeats over and over again ------------ This is my cloud sync config: ------------ { "id": "7185f1a9-f33b-41d3-8906-634ac096d4a9", "name": "backup", "domain_root": "backup.rgw.meta:root", "control_pool": "backup.rgw.control", "gc_pool": "backup.rgw.log:gc", "lc_pool": "backup.rgw.log:lc", "log_pool": "backup.rgw.log", "intent_log_pool": "backup.rgw.log:intent", "usage_log_pool": "backup.rgw.log:usage", "roles_pool": "backup.rgw.meta:roles", "reshard_pool": "backup.rgw.log:reshard", "user_keys_pool": "backup.rgw.meta:users.keys", "user_email_pool": "backup.rgw.meta:users.email", "user_swift_pool": "backup.rgw.meta:users.swift", "user_uid_pool": "backup.rgw.meta:users.uid", "otp_pool": "backup.rgw.otp", "system_key": { "access_key": "<REDACTED>", "secret_key": "<REDACTED>" }, "placement_pools": [ { "key": "default-placement", "val": { "index_pool": "backup.rgw.buckets.index", "storage_classes": { "STANDARD": { "data_pool": "backup.rgw.buckets.data" } }, "data_extra_pool": "backup.rgw.buckets.non-ec", "index_type": 0 } } ], "tier_config": { "connection": { "access_key": "<REDACTED>", "endpoint": "http://[<REDACTED>]:9000", "secret": "<REDACTED>" } }, "realm_id": "d1e9f0cd-c965-44c6-a4bd-b7704cab9c4e", "notif_pool": "backup.rgw.log:notif" } ------------ This is the sync status: ------------ # radosgw-admin sync status --rgw-zone=backup realm d1e9f0cd-c965-44c6-a4bd-b7704cab9c4e (defaultrealm) zonegroup cee2848e-368f-45d3-8310-caab37b022a7 (default) zone 7185f1a9-f33b-41d3-8906-634ac096d4a9 (backup) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: a14cce61-8951-438f-89f6-4e65637e2941 (default) syncing full sync: 128/128 shards full sync: 3404 buckets to sync incremental sync: 0/128 shards data is behind on 128 shards behind shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127] 1 shards are recovering recovering shards: [18] ------------ This is the output on the receiving RGW Nov 21 10:52:15 <hostname> radosgw[510538]: rgw rados thread: AWS: creating bucket rgw-default-61727a643fba391a Nov 21 10:52:15 <hostname> radosgw[510538]: rgw rados thread: AWS: download begin: z=a14cce61-8951-438f-89f6-4e65637e2941 b=:<REDACTED>[a14cce61-8951-438f-89f6-4e65637e2941.28999987.12]) k=0002cb99-3faa-42e1-a760-364c4ffba982 size=23537 mtime=2022-04-03T14:48:06.449064+0000 etag=0325bf0901634ce13405bea67767b8f4 zone_short_id=0 pg_ver=744888 Nov 21 10:52:15 <hostname> radosgw[510538]: rgw rados thread: AWS: creating bucket rgw-default-61727a643fba391a Nov 21 10:52:15 <hostname> radosgw[510538]: failed to wait for op, ret=-39: PUT http://[<REDACTED>]:9000/rgw-default-61727a643fba391a ------------ After some retries the receiving RGW crashes with the following message ------------ Nov 21 10:52:15 prod-backup-201.ceph.plusline.net radosgw[510538]: *** Caught signal (Segmentation fault) ** in thread 7f8cff660700 thread_name:data-sync ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable) 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f8d29e6ece0] 2: /lib64/libc.so.6(+0xcfd02) [0x7f8d28540d02] 3: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x95) [0x7f8d2cae2ef5] 4: (RGWAWSStreamObjToCloudPlainCR::operate(DoutPrefixProvider const*)+0x255) [0x7f8d2ce581f5] 5: (RGWCoroutinesStack::operate(DoutPrefixProvider const*, RGWCoroutinesEnv*)+0x15c) [0x7f8d2cefe03c] 6: (RGWCoroutinesManager::run(DoutPrefixProvider const*, std::__cxx11::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x296) [0x7f8d2cefee66] 7: (RGWCoroutinesManager::run(DoutPrefixProvider const*, RGWCoroutine*)+0x91) [0x7f8d2cf00131] 8: (RGWRemoteDataLog::run_sync(DoutPrefixProvider const*, int)+0x1b4) [0x7f8d2cdef104] 9: (RGWDataSyncProcessorThread::process(DoutPrefixProvider const*)+0x59) [0x7f8d2cfcfde9] 10: (RGWRadosThread::Worker::entry()+0x13a) [0x7f8d2cf8e30a] 11: /lib64/libpthread.so.0(+0x81ca) [0x7f8d29e641ca] 12: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. ------------ I can also see that the objects are successfully being requested on the originating RGW servers. Any idea of the root cause would be welcome. Best regards, Matthias _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx