RGW multisite replication doesn't start

Eugen Block <eblock@xxxxxx> · Fri, 18 Sep 2020 12:24:17 +0000

Hi *,

I have 2 virtual one-node-clusters configured for multisite RGW. In  
the beginning the replication actually worked  for some hundred MB or  
so, and then it stopped. In the meantime I wiped both RGWs twice to  
make sure the configuration is right (including wiping all pools  
clean). I don't see any errors in the logs but nothing happens on the  
secondary site. Both clusters are healthy, RGWs run with https.  
Uploading data directly to the secondary site also works, so the  
configuration seems ok to me.

These is the current rgw status:

---snip---
primary:~ # radosgw-admin sync status
          realm c7d5fd30-9c06-46a1-baf4-497f95bf3abc (hamburg)
      zonegroup 68adec15-aace-403d-bd63-f5182a6437b1 (zg-hamburg)
           zone 0fb33fa1-8110-4179-ae45-acf5f5f825c5 (z-primary)
  metadata sync no sync (zone is master)

secondary:~ # radosgw-admin sync status
2020-09-17T09:34:59.593+0200 7fdd3e706a40  1 Cannot find zone  
id=93ece7a6-beef-4f4e-841a-60ba0405f192 (name=z-secondary), switching  
to local zonegroup configuration
          realm c7d5fd30-9c06-46a1-baf4-497f95bf3abc (hamburg)
      zonegroup 68adec15-aace-403d-bd63-f5182a6437b1 (zg-hamburg)
           zone 93ece7a6-beef-4f4e-841a-60ba0405f192 (z-secondary)
  metadata sync syncing
                full sync: 64/64 shards
                full sync: 3 entries to sync
                incremental sync: 0/64 shards
                metadata is behind on 64 shards
                behind shards:  
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]
      data sync source: 0fb33fa1-8110-4179-ae45-acf5f5f825c5 (z-primary)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
---snip---

Since the data was not replicated I ran a 'radosgw-admin metadata sync  
run --source-zone=z-primary' but it never finishes. If I do the same  
with data it will show that all shards are behind on data but nothing  
will happen either.
I also don't understand the 'Cannot find zone  
id=93ece7a6-beef-4f4e-841a-60ba0405f192 (name=z-secondary), switching  
to local zonegroup configuration' message but this didn't break the  
replication in the first attempt, so I ignored it. Or is this  
something I should fix first (if yes, how)?

Can anyone point me to what's going on here? I can provide more  
details if necessary, just let me know.

Thank you!
Eugen
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx