Re: RGW multisite metadata sync issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I also see this in the output of radosgw-admin metadata sync status. I
think it's strange because there should be a marker to follow the sync.
            {
                "key": 0,
                "val": {
                    "state": 0,
                    "marker": "",
                    "next_step_marker": "1_1730469205.875723_877487777.1",
                    "total_entries": 174,
                    "pos": 0,
                    "timestamp": "2024-11-01T13:53:25.875723Z",
                    "realm_epoch": 0
                }

On Mon, Dec 16, 2024 at 1:24 PM Vahideh Alinouri <vahideh.alinouri@xxxxxxxxx>
wrote:

> I also see this log in the RGW log:
>
> 2024-12-16T12:23:58.651+0000 7f9b2b9fe700  1 ====== starting new request
> req=0x7f9ad9959730 =====
> 2024-12-16T12:23:58.651+0000 7f9b2b9fe700 -1 req 11778501317150336521
> 0.000000000s :list_data_changes_log int
> rgw::cls::fifo::{anonymous}::list_part(const DoutPrefixProvider*,
> librados::v14_2_0::IoCtx&, const string&,
> std::optional<std::basic_string_view<char> >, uint64_t, uint64_t,
> std::vector<rados::cls::fifo::part_list_entry>*, bool*, bool*,
> std::string*, uint64_t, optional_yield):245 fifo::op::LIST_PART failed
> r=-34 tid=4176
> 2024-12-16T12:23:58.651+0000 7f9b2b9fe700 -1 req 11778501317150336521
> 0.000000000s :list_data_changes_log int rgw::cls::fifo::FIFO::list(const
> DoutPrefixProvider*, int, std::optional<std::basic_string_view<char> >,
> std::vector<rgw::cls::fifo::list_entry>*, bool*, optional_yield):1660
> list_entries failed: r=-34 tid= 4176
> 2024-12-16T12:23:58.651+0000 7f9b2b9fe700 -1 req 11778501317150336521
> 0.000000000s :list_data_changes_log virtual int
> RGWDataChangesFIFO::list(const DoutPrefixProvider*, int, int,
> std::vector<rgw_data_change_log_entry>&,
> std::optional<std::basic_string_view<char> >, std::string*, bool*): unable
> to list FIFO: data_log.44: (34) Numerical result out of range
>
> On Sun, Dec 15, 2024 at 10:45 PM Vahideh Alinouri <
> vahideh.alinouri@xxxxxxxxx> wrote:
>
>> Hi guys,
>>
>> My Ceph release is Quincy 17.2.5. I need to change the master zone to
>> decommission the old one and upgrade all zones. I have separated the client
>> traffic and sync traffic in RGWs, meaning there are separate RGW daemons
>> handling the sync process.
>>
>> I encountered an issue when trying to sync one of the zones in the
>> zonegroup. The data sync is proceeding fine, but I have an issue with the
>> metadata sync. It gets stuck behind on a shard. Here is the output from radosgw-admin
>> sync status:
>>
>> metadata sync syncing
>>     full sync: 1/64 shards
>>     full sync: 135 entries to sync
>>     incremental sync: 63/64 shards
>>     metadata is behind on 1 shard
>>     behind shards: [0]
>>
>> In the RGW log, I see this error:
>> 2024-12-15T21:30:59.641+0000 7f6dff472700 1 beast: 0x7f6d2f1cf730:
>> 172.19.66.112 - s3-cdn-user [15/Dec/2024:21:30:59.641 +0000] "GET
>> /admin/log/?type=data&id=56&marker=00000000000000000000%3A00000000000000204086&extra-info=true&rgwx-zonegroup=7c01d60f-88c6-4192-baf7-d725260bf05d
>> HTTP/1.1" 200 44 - - - latency=0.000000000s
>> 2024-12-15T21:30:59.701+0000 7f6e44d1e700 0 meta sync: ERROR:
>> full_sync(): RGWRadosGetOmapKeysCR() returned ret=-2
>> 2024-12-15T21:30:59.701+0000 7f6e44d1e700 0 RGW-SYNC:meta:shard[0]:
>> ERROR: failed to list omap keys, status=-2
>> 2024-12-15T21:30:59.701+0000 7f6e44d1e700 0 meta sync: ERROR:
>> RGWBackoffControlCR called coroutine returned -2
>> 2024-12-15T21:31:00.705+0000 7f6e44d1e700 0 meta sync: ERROR:
>> full_sync(): RGWRadosGetOmapKeysCR() returned ret=-2
>> 2024-12-15T21:31:00.705+0000 7f6e44d1e700 0 RGW-SYNC:meta:shard[0]:
>> ERROR: failed to list omap keys, status=-2
>> 2024-12-15T21:31:00.705+0000 7f6e44d1e700 0 meta sync: ERROR:
>> RGWBackoffControlCR called coroutine returned -2
>>
>> I’ve tried the following steps:
>>
>>    - Changed the PG number of the metadata pool to force a rebalance,
>>    but everything was fine.
>>    - Ran metadata sync init and tried to run it again.
>>    - Restarted RGW services in both the zone and the master zone.
>>    - Created a user in the master zone to ensure metadata sync works,
>>    which was successful.
>>    - Checked OSD logs but didn’t see any specific errors.
>>    - Attempted to list metadata in the pool using rados ls -p
>>    s3-cdn-dc07.rgw.meta, but got an empty result.
>>    - Compared the code for listing OMAP keys between Quincy and Squid
>>    versions; there were no specific changes.
>>
>> I’m looking for any advice or suggestions to resolve this issue.
>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux