Re: RGW multisite metadata sync issue

Vahideh Alinouri <vahideh.alinouri@xxxxxxxxx> · Mon, 23 Dec 2024 14:15:03 +0100

When I increased the debug level of the RGW sync client to 20, I get it:

2024-12-23T09:42:17.248+0000 7f124866b700 20 register_request
mgr=0x5633b4b8d958 req_data->id=79162099, curl_handle=0x5633bb89ee60
2024-12-23T09:42:17.248+0000 7f124866b700 20 run: stack=0x5633b51952c0 is
io blocked
2024-12-23T09:42:17.248+0000 7f1248e6c700 20 link_request
req_data=0x5633bc240d20 req_data->id=79162099, curl_handle=0x5633bb89ee60
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633b50f1540:op=0x5633b50bb100:25RGWMetaSyncShardControlCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633b50f1540:op=0x5633b520f400:18RGWMetaSyncShardCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 10 RGW-SYNC:meta:shard[0]: start
full sync
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b7731500:20RGWContinuousLeaseCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633b50f1540:op=0x5633b520f400:18RGWMetaSyncShardCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 run: stack=0x5633b50f1540
is_blocked_by_stack()=0 is_sleeping=1 waiting_for_child()=0
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b796ce00:20RGWSimpleRadosLockCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b796ce00:20RGWSimpleRadosLockCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 enqueued request
req=0x5633be14c500
2024-12-23T09:42:17.252+0000 7f124866b700 20 RGWWQ:
2024-12-23T09:42:17.252+0000 7f124866b700 20 req: 0x5633be14c500
2024-12-23T09:42:17.252+0000 7f124866b700 20 run: stack=0x5633bb892000 is
io blocked
2024-12-23T09:42:17.252+0000 7f125167d700 20 dequeued request
req=0x5633be14c500
2024-12-23T09:42:17.252+0000 7f125167d700 20 RGWWQ: empty
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b796ce00:20RGWSimpleRadosLockCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b796ce00:20RGWSimpleRadosLockCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b796ce00:20RGWSimpleRadosLockCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b796ce00:20RGWSimpleRadosLockCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b7731500:20RGWContinuousLeaseCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 run: stack=0x5633bb892000 is
io blocked
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633b50f1540:op=0x5633b520f400:18RGWMetaSyncShardCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 10 RGW-SYNC:meta:shard[0]: took
lease
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633b50f1540:op=0x5633b796ce00:21RGWRadosGetOmapKeysCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633b50f1540:op=0x5633b796ce00:21RGWRadosGetOmapKeysCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 run: stack=0x5633b50f1540 is
io blocked
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633b50f1540:op=0x5633b796ce00:21RGWRadosGetOmapKeysCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633b50f1540:op=0x5633b796ce00:21RGWRadosGetOmapKeysCR: operate()
returned r=-2
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633b50f1540:op=0x5633b520f400:18RGWMetaSyncShardCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700  0 meta sync: ERROR: full_sync():
RGWRadosGetOmapKeysCR() returned ret=-2
2024-12-23T09:42:17.252+0000 7f124866b700  0 RGW-SYNC:meta:shard[0]: ERROR:
failed to list omap keys, status=-2
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633b50f1540:op=0x5633b520f400:18RGWMetaSyncShardCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 run: stack=0x5633b50f1540
is_blocked_by_stack()=0 is_sleeping=0 waiting_for_child()=1
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b7731500:20RGWContinuousLeaseCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b796ce00:22RGWSimpleRadosUnlockCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 rgw rados thread:
cr:s=0x5633bb892000:op=0x5633b796ce00:22RGWSimpleRadosUnlockCR: operate()
2024-12-23T09:42:17.252+0000 7f124866b700 20 enqueued request
req=0x5633bd10ea20

And when I try to run this to get the list of OMAP, the output is empty!
rados -p s3-cdn-dc07.rgw.meta listomapvals -N root .bucket.meta.XXXX:XXXX.9

When I list the PGs of meta pool using ceph pg ls-by-pool
s3-cdn-dc07.rgw.meta, There are some PGs that have BYTES size but
OMAP_BYTES and OMAP_KEYS values are 0!

PG    OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES  OMAP_BYTES*  OMAP_KEYS*
 LOG   STATE         SINCE  VERSION       REPORTED       UP
ACTING         SCRUB_STAMP                      DEEP_SCRUB_STAMP
      LAST_SCRUB_DURATION  SCRUB_SCHEDULING

10.0       14         0          0        0   3364         1676           9
  957  active+clean     6h   57938'32678  57939:9604093  [70,88,46]p70
 [70,88,46]p70  2024-12-23T04:57:43.792352+0000
 2024-12-21T22:45:49.820084+0000                    1  periodic scrub
scheduled @ 2024-12-24T16:29:19.844180+0000
*10.1       10         0          0        0   3073            0
0    49  active+clean    14h   57715'25516  57939:8975816    [0,60,52]p0
 [0,60,52]p0  2024-12-22T20:40:57.494355+0000
 2024-12-17T17:43:43.383320+0000                    1  periodic scrub
scheduled @ 2024-12-23T22:30:45.498616+0000*
10.2       15         0          0        0   3706          188           1
  132  active+clean    27h   57938'26080  57938:8966743   [39,14,1]p39
[39,14,1]p39  2024-12-22T08:05:34.147330+0000
 2024-12-22T08:05:34.147330+0000                    1  periodic scrub
scheduled @ 2024-12-23T12:00:40.771526+0000
10.3       14         0          0        0   3898         2234          11
  414  active+clean     2h   57939'36531  57939:9388345   [40,78,6]p40
[40,78,6]p40  2024-12-23T09:00:29.100038+0000
 2024-12-16T12:58:31.054537+0000                    1  periodic scrub
scheduled @ 2024-12-24T14:36:29.705903+0000

*10.4       13         0          0        0   3346            0
0    67  active+clean    25h   57715'25662  57939:8964482  [34,44,25]p34
 [34,44,25]p34  2024-12-22T09:26:16.897803+0000
 2024-12-16T23:09:15.271909+0000                    1  periodic scrub
scheduled @ 2024-12-23T20:57:45.809155+000010.5       13         0
 0        0   3567            0           0  2904  active+clean    28h
 57939'120020  57939:9016514  [66,37,91]p66  [66,37,91]p66
 2024-12-22T07:03:26.677870+0000  2024-12-19T09:16:25.987456+0000
         1  periodic scrub scheduled @ 2024-12-23T17:12:34.661323+000010.6
       7         0          0        0   2159            0           0
 39  active+clean    15h   57938'25500  57939:9961651  [19,24,56]p19
 [19,24,56]p19  2024-12-22T19:16:11.782123+0000
 2024-12-22T19:16:11.782123+0000                    1  periodic scrub
scheduled @ 2024-12-24T01:40:30.212204+000010.7       20         0
 0        0   5963            0           0   116  active+clean    33m
57715'25722  57939:9028884  [50,58,84]p50  [50,58,84]p50
 2024-12-23T10:35:45.237158+0000  2024-12-17T03:28:34.643774+0000
         1  periodic scrub scheduled @ 2024-12-24T18:39:23.977660+0000*
10.8       20         0          0        0   6074         2155          11
  226  active+clean     9h   57938'25807  57939:9347768   [22,4,42]p22
[22,4,42]p22  2024-12-23T02:07:21.060581+0000
 2024-12-21T23:16:53.596335+0000                    1  periodic scrub
scheduled @ 2024-12-24T10:21:26.911522+0000
*10.9       10         0          0        0   3046            0
0    63  active+clean    10h   57715'25597  57939:9021141  [10,38,15]p10
 [10,38,15]p10  2024-12-23T00:19:34.157798+0000
 2024-12-20T11:28:20.294176+0000                    1  periodic scrub
scheduled @ 2024-12-24T01:15:58.674425+0000*
10.a       13         0          0        0   3400          188           1
   79  active+clean     6h   57715'25740  57938:9021299    [2,31,80]p2
 [2,31,80]p2  2024-12-23T04:58:25.809790+0000
 2024-12-16T15:51:26.184478+0000                    1  periodic scrub
scheduled @ 2024-12-24T05:19:15.221400+0000
10.b       11         0          0        0   2915          176           1
   68  active+clean     3h   57854'25555  57939:9067491  [53,82,20]p53
 [53,82,20]p53  2024-12-23T07:52:31.500216+0000
 2024-12-22T07:11:15.383987+0000                    1  periodic scrub
scheduled @ 2024-12-24T11:42:16.438508+0000
10.c       18         0          0        0   4430         2182          11
  458  active+clean     9h   57939'36171  57939:9036209   [16,48,8]p16
[16,48,8]p16  2024-12-23T01:46:59.720703+0000
 2024-12-21T19:11:29.473994+0000                    1  periodic scrub
scheduled @ 2024-12-24T11:00:21.726487+0000
10.d       13         0          0        0   3135          187           1
  122  active+clean     8h   57901'26081  57939:9015503   [36,3,13]p36
[36,3,13]p36  2024-12-23T02:31:13.107381+0000
 2024-12-16T21:47:29.385720+0000                    1  periodic scrub
scheduled @ 2024-12-24T10:01:35.665610+0000
*10.e       14         0          0        0   3366            0
0    71  active+clean     7h   57715'25607  57939:9843618  [83,41,17]p83
 [83,41,17]p83  2024-12-23T03:39:55.648905+0000
 2024-12-20T08:01:47.715401+0000                    1  periodic scrub
scheduled @ 2024-12-24T09:58:14.031613+0000*
10.f       14         0          0        0   4105         2508          13
  521  active+clean    20h   57939'39421  57939:8983895  [33,49,68]p33
 [33,49,68]p33  2024-12-22T14:39:32.975012+0000
 2024-12-22T14:39:32.975012+0000                    1  periodic scrub
scheduled @ 2024-12-23T23:39:32.875651+0000

I tried to fix them with ceph pg deep-scrub, but there were no changes!

I started to check the primary OSD log of 10.1 in detail. I see the
following logs. Why

2024-12-23T00:03:38.920+0000 7f3fef2d7700 0 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/cls/fifo/cls_fifo.cc:112:
ERROR: int
rados::cls::fifo::{anonymous}::read_part_header(cls_method_context_t,
rados::cls::fifo::part_header*): failed decoding part header
2024-12-23T00:03:38.920+0000 7f3fef2d7700 0 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/cls/fifo/cls_fifo.cc:781:
int rados::cls::fifo::{anonymous}::trim_part(cls_method_context_t,
ceph::buffer::v15_2_0::list*, ceph::buffer::v15_2_0::list*): failed to read
part header 2024-12-23T00:03:38.920+0000 7f3feb2cf700 0 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/cls/fifo/cls_fifo.cc:112:
ERROR: int
rados::cls::fifo::{anonymous}::read_part_header(cls_method_context_t,
rados::cls::fifo::part_header*): failed decoding part header
2024-12-23T00:03:38.920+0000 7f3feb2cf700 0 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/cls/fifo/cls_fifo.cc:781:
int rados::cls::fifo::{anonymous}::trim_part(cls_method_context_t,
ceph::buffer::v15_2_0::list*, ceph::buffer::v15_2_0::list*): failed to read
part header 2024-12-23T00:03:38.920+0000 7f3feb2cf700 0 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/cls/fifo/cls_fifo.cc:112:
ERROR: int
rados::cls::fifo::{anonymous}::read_part_header(cls_method_context_t,
rados::cls::fifo::part_header*): failed decoding part header
2024-12-23T00:03:38.920+0000 7f3feb2cf700 0 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.5/rpm/el8/BUILD/ceph-17.2.5/src/cls/fifo/cls_fifo.cc:781:
int rados::cls::fifo::{anonymous}::trim_part(cls_method_context_t,
ceph::buffer::v15_2_0::list*, ceph::buffer::v15_2_0::list*): failed to read
part header

On Fri, Dec 20, 2024 at 10:59 AM Eugen Block <eblock@xxxxxx> wrote:

> I don't really have a good idea, except for maybe running "metadata
> sync init" and "metadata sync run"?
> But I wanted to point out that the .meta pool uses namespaces.
>
> > - Attempted to list metadata in the pool using rados ls -p
> > s3-cdn-dc07.rgw.meta, but got an empty result.
>
> Try this instead:
>
> rados -p s3-cdn-dc07.rgw.meta ls --all
>
> Do you have a specific object where the omap data is missing? Maybe
> increase debug_rgw logs to find a trace why it's missing.
>
> Zitat von Vahideh Alinouri <vahideh.alinouri@xxxxxxxxx>:
>
> > I also see this in the output of radosgw-admin metadata sync status. I
> > think it's strange because there should be a marker to follow the sync.
> >             {
> >                 "key": 0,
> >                 "val": {
> >                     "state": 0,
> >                     "marker": "",
> >                     "next_step_marker":
> "1_1730469205.875723_877487777.1",
> >                     "total_entries": 174,
> >                     "pos": 0,
> >                     "timestamp": "2024-11-01T13:53:25.875723Z",
> >                     "realm_epoch": 0
> >                 }
> >
> > On Mon, Dec 16, 2024 at 1:24 PM Vahideh Alinouri <
> vahideh.alinouri@xxxxxxxxx>
> > wrote:
> >
> >> I also see this log in the RGW log:
> >>
> >> 2024-12-16T12:23:58.651+0000 7f9b2b9fe700  1 ====== starting new request
> >> req=0x7f9ad9959730 =====
> >> 2024-12-16T12:23:58.651+0000 7f9b2b9fe700 -1 req 11778501317150336521
> >> 0.000000000s :list_data_changes_log int
> >> rgw::cls::fifo::{anonymous}::list_part(const DoutPrefixProvider*,
> >> librados::v14_2_0::IoCtx&, const string&,
> >> std::optional<std::basic_string_view<char> >, uint64_t, uint64_t,
> >> std::vector<rados::cls::fifo::part_list_entry>*, bool*, bool*,
> >> std::string*, uint64_t, optional_yield):245 fifo::op::LIST_PART failed
> >> r=-34 tid=4176
> >> 2024-12-16T12:23:58.651+0000 7f9b2b9fe700 -1 req 11778501317150336521
> >> 0.000000000s :list_data_changes_log int rgw::cls::fifo::FIFO::list(const
> >> DoutPrefixProvider*, int, std::optional<std::basic_string_view<char> >,
> >> std::vector<rgw::cls::fifo::list_entry>*, bool*, optional_yield):1660
> >> list_entries failed: r=-34 tid= 4176
> >> 2024-12-16T12:23:58.651+0000 7f9b2b9fe700 -1 req 11778501317150336521
> >> 0.000000000s :list_data_changes_log virtual int
> >> RGWDataChangesFIFO::list(const DoutPrefixProvider*, int, int,
> >> std::vector<rgw_data_change_log_entry>&,
> >> std::optional<std::basic_string_view<char> >, std::string*, bool*):
> unable
> >> to list FIFO: data_log.44: (34) Numerical result out of range
> >>
> >> On Sun, Dec 15, 2024 at 10:45 PM Vahideh Alinouri <
> >> vahideh.alinouri@xxxxxxxxx> wrote:
> >>
> >>> Hi guys,
> >>>
> >>> My Ceph release is Quincy 17.2.5. I need to change the master zone to
> >>> decommission the old one and upgrade all zones. I have separated the
> client
> >>> traffic and sync traffic in RGWs, meaning there are separate RGW
> daemons
> >>> handling the sync process.
> >>>
> >>> I encountered an issue when trying to sync one of the zones in the
> >>> zonegroup. The data sync is proceeding fine, but I have an issue with
> the
> >>> metadata sync. It gets stuck behind on a shard. Here is the output
> >>> from radosgw-admin
> >>> sync status:
> >>>
> >>> metadata sync syncing
> >>>     full sync: 1/64 shards
> >>>     full sync: 135 entries to sync
> >>>     incremental sync: 63/64 shards
> >>>     metadata is behind on 1 shard
> >>>     behind shards: [0]
> >>>
> >>> In the RGW log, I see this error:
> >>> 2024-12-15T21:30:59.641+0000 7f6dff472700 1 beast: 0x7f6d2f1cf730:
> >>> 172.19.66.112 - s3-cdn-user [15/Dec/2024:21:30:59.641 +0000] "GET
> >>>
> /admin/log/?type=data&id=56&marker=00000000000000000000%3A00000000000000204086&extra-info=true&rgwx-zonegroup=7c01d60f-88c6-4192-baf7-d725260bf05d
> >>> HTTP/1.1" 200 44 - - - latency=0.000000000s
> >>> 2024-12-15T21:30:59.701+0000 7f6e44d1e700 0 meta sync: ERROR:
> >>> full_sync(): RGWRadosGetOmapKeysCR() returned ret=-2
> >>> 2024-12-15T21:30:59.701+0000 7f6e44d1e700 0 RGW-SYNC:meta:shard[0]:
> >>> ERROR: failed to list omap keys, status=-2
> >>> 2024-12-15T21:30:59.701+0000 7f6e44d1e700 0 meta sync: ERROR:
> >>> RGWBackoffControlCR called coroutine returned -2
> >>> 2024-12-15T21:31:00.705+0000 7f6e44d1e700 0 meta sync: ERROR:
> >>> full_sync(): RGWRadosGetOmapKeysCR() returned ret=-2
> >>> 2024-12-15T21:31:00.705+0000 7f6e44d1e700 0 RGW-SYNC:meta:shard[0]:
> >>> ERROR: failed to list omap keys, status=-2
> >>> 2024-12-15T21:31:00.705+0000 7f6e44d1e700 0 meta sync: ERROR:
> >>> RGWBackoffControlCR called coroutine returned -2
> >>>
> >>> I’ve tried the following steps:
> >>>
> >>>    - Changed the PG number of the metadata pool to force a rebalance,
> >>>    but everything was fine.
> >>>    - Ran metadata sync init and tried to run it again.
> >>>    - Restarted RGW services in both the zone and the master zone.
> >>>    - Created a user in the master zone to ensure metadata sync works,
> >>>    which was successful.
> >>>    - Checked OSD logs but didn’t see any specific errors.
> >>>    - Attempted to list metadata in the pool using rados ls -p
> >>>    s3-cdn-dc07.rgw.meta, but got an empty result.
> >>>    - Compared the code for listing OMAP keys between Quincy and Squid
> >>>    versions; there were no specific changes.
> >>>
> >>> I’m looking for any advice or suggestions to resolve this issue.
> >>>
> >>>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx