On Wed, May 3, 2023 at 11:24 AM Kamil Madac <kamil.madac@xxxxxxxxx> wrote: > > Hi, > > We deployed pacific cluster 16.2.12 with cephadm. We experience following > error during rbd map: > > [Wed May 3 08:59:11 2023] libceph: mon2 (1)[2a00:da8:ffef:1433::]:6789 > session established > [Wed May 3 08:59:11 2023] libceph: another match of type 1 in addrvec > [Wed May 3 08:59:11 2023] libceph: corrupt full osdmap (-22) epoch 200 off > 1042 (000000009876284d of 000000000cb24b58-0000000080b70596) > [Wed May 3 08:59:11 2023] osdmap: 00000000: 08 07 7d 10 00 00 09 01 5d 09 > 00 00 a2 22 3b 86 ..}.....]....";. > [Wed May 3 08:59:11 2023] osdmap: 00000010: e4 f5 11 ed 99 ee 47 75 ca 3c > ad 23 c8 00 00 00 ......Gu.<.#.... > [Wed May 3 08:59:11 2023] osdmap: 00000020: 21 68 4a 64 98 d2 5d 2e 84 fd > 50 64 d9 3a 48 26 !hJd..]...Pd.:H& > [Wed May 3 08:59:11 2023] osdmap: 00000030: 02 00 00 00 01 00 00 00 00 00 > 00 00 1d 05 71 01 ..............q. > .... > > Linux Kernel is 6.1.13 and the important thing is that we are using ipv6 > addresses for connection to ceph nodes. > We were able to map rbd from client with kernel 5.10, but in prod > environment we are not allowed to use that kernel. > > What could be the reason for such behavior on newer kernels and how to > troubleshoot it? > > Here is output of ceph osd dump: > > # ceph osd dump > epoch 200 > fsid a2223b86-e4f5-11ed-99ee-4775ca3cad23 > created 2023-04-27T12:18:41.777900+0000 > modified 2023-05-02T12:09:40.642267+0000 > flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit > crush_version 34 > full_ratio 0.95 > backfillfull_ratio 0.9 > nearfull_ratio 0.85 > require_min_compat_client luminous > min_compat_client jewel > require_osd_release pacific > stretch_mode_enabled false > pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 183 > flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application > mgr_devicehealth > pool 2 'idp' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins > pg_num 32 pgp_num 32 autoscale_mode on last_change 48 flags > hashpspool,selfmanaged_snaps stripe_width 0 application rbd > max_osd 3 > osd.0 up in weight 1 up_from 176 up_thru 182 down_at 172 > last_clean_interval [170,171) > [v2:[2a00:da8:ffef:1431::]:6800/805023868,v1:[2a00:da8:ffef:1431::]:6801/805023868,v2: > 0.0.0.0:6802/805023868,v1:0.0.0.0:6803/805023868] > [v2:[2a00:da8:ffef:1431::]:6804/805023868,v1:[2a00:da8:ffef:1431::]:6805/805023868,v2: > 0.0.0.0:6806/805023868,v1:0.0.0.0:6807/805023868] exists,up > e8fd0ee2-ea63-4d02-8f36-219d36869078 > osd.1 up in weight 1 up_from 136 up_thru 182 down_at 0 > last_clean_interval [0,0) > [v2:[2a00:da8:ffef:1432::]:6800/2172723816,v1:[2a00:da8:ffef:1432::]:6801/2172723816,v2: > 0.0.0.0:6802/2172723816,v1:0.0.0.0:6803/2172723816] > [v2:[2a00:da8:ffef:1432::]:6804/2172723816,v1:[2a00:da8:ffef:1432::]:6805/2172723816,v2: > 0.0.0.0:6806/2172723816,v1:0.0.0.0:6807/2172723816] exists,up > 0b7b5628-9273-4757-85fb-9c16e8441895 > osd.2 up in weight 1 up_from 182 up_thru 182 down_at 178 > last_clean_interval [123,177) > [v2:[2a00:da8:ffef:1433::]:6800/887631330,v1:[2a00:da8:ffef:1433::]:6801/887631330,v2: > 0.0.0.0:6802/887631330,v1:0.0.0.0:6803/887631330] > [v2:[2a00:da8:ffef:1433::]:6804/887631330,v1:[2a00:da8:ffef:1433::]:6805/887631330,v2: > 0.0.0.0:6806/887631330,v1:0.0.0.0:6807/887631330] exists,up > 21f8d0d5-6a3f-4f78-96c8-8ec4e4f78a01 Hi Kamil, The issue is bogus 0.0.0.0 addresses. This came up before, see [1] and later messages from Stefan in the thread. You would need to ensure that ms_bind_ipv4 is set to false and restart OSDs. [1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/Q6VYRJBPHQI63OQTBJG2N3BJD2KBEZM4/ Thanks, Ilya _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx