Re: rbd map: corrupt full osdmap (-22) when

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the info.

As a solution we used rbd-nbd which works fine without any issues. If we
will have time we will also try to disable ipv4 on the cluster and will try
kernel rbd mapping again. Are there any disadvantages when using NBD
instead of kernel driver?

Thanks

On Wed, May 3, 2023 at 4:06 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:

> On Wed, May 3, 2023 at 11:24 AM Kamil Madac <kamil.madac@xxxxxxxxx> wrote:
> >
> > Hi,
> >
> > We deployed pacific cluster 16.2.12 with cephadm. We experience following
> > error during rbd map:
> >
> > [Wed May  3 08:59:11 2023] libceph: mon2 (1)[2a00:da8:ffef:1433::]:6789
> > session established
> > [Wed May  3 08:59:11 2023] libceph: another match of type 1 in addrvec
> > [Wed May  3 08:59:11 2023] libceph: corrupt full osdmap (-22) epoch 200
> off
> > 1042 (000000009876284d of 000000000cb24b58-0000000080b70596)
> > [Wed May  3 08:59:11 2023] osdmap: 00000000: 08 07 7d 10 00 00 09 01 5d
> 09
> > 00 00 a2 22 3b 86  ..}.....]....";.
> > [Wed May  3 08:59:11 2023] osdmap: 00000010: e4 f5 11 ed 99 ee 47 75 ca
> 3c
> > ad 23 c8 00 00 00  ......Gu.<.#....
> > [Wed May  3 08:59:11 2023] osdmap: 00000020: 21 68 4a 64 98 d2 5d 2e 84
> fd
> > 50 64 d9 3a 48 26  !hJd..]...Pd.:H&
> > [Wed May  3 08:59:11 2023] osdmap: 00000030: 02 00 00 00 01 00 00 00 00
> 00
> > 00 00 1d 05 71 01  ..............q.
> > ....
> >
> > Linux Kernel is 6.1.13 and the important thing is that we are using ipv6
> > addresses for connection to ceph nodes.
> > We were able to map rbd from client with kernel 5.10, but in prod
> > environment we are not allowed to use that kernel.
> >
> > What could be the reason for such behavior on newer kernels and how to
> > troubleshoot it?
> >
> > Here is output of ceph osd dump:
> >
> > # ceph osd dump
> > epoch 200
> > fsid a2223b86-e4f5-11ed-99ee-4775ca3cad23
> > created 2023-04-27T12:18:41.777900+0000
> > modified 2023-05-02T12:09:40.642267+0000
> > flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
> > crush_version 34
> > full_ratio 0.95
> > backfillfull_ratio 0.9
> > nearfull_ratio 0.85
> > require_min_compat_client luminous
> > min_compat_client jewel
> > require_osd_release pacific
> > stretch_mode_enabled false
> > pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0
> > object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 183
> > flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application
> > mgr_devicehealth
> > pool 2 'idp' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins
> > pg_num 32 pgp_num 32 autoscale_mode on last_change 48 flags
> > hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> > max_osd 3
> > osd.0 up   in  weight 1 up_from 176 up_thru 182 down_at 172
> > last_clean_interval [170,171)
> >
> [v2:[2a00:da8:ffef:1431::]:6800/805023868,v1:[2a00:da8:ffef:1431::]:6801/805023868,v2:
> > 0.0.0.0:6802/805023868,v1:0.0.0.0:6803/805023868]
> >
> [v2:[2a00:da8:ffef:1431::]:6804/805023868,v1:[2a00:da8:ffef:1431::]:6805/805023868,v2:
> > 0.0.0.0:6806/805023868,v1:0.0.0.0:6807/805023868] exists,up
> > e8fd0ee2-ea63-4d02-8f36-219d36869078
> > osd.1 up   in  weight 1 up_from 136 up_thru 182 down_at 0
> > last_clean_interval [0,0)
> >
> [v2:[2a00:da8:ffef:1432::]:6800/2172723816,v1:[2a00:da8:ffef:1432::]:6801/2172723816,v2:
> > 0.0.0.0:6802/2172723816,v1:0.0.0.0:6803/2172723816]
> >
> [v2:[2a00:da8:ffef:1432::]:6804/2172723816,v1:[2a00:da8:ffef:1432::]:6805/2172723816,v2:
> > 0.0.0.0:6806/2172723816,v1:0.0.0.0:6807/2172723816] exists,up
> > 0b7b5628-9273-4757-85fb-9c16e8441895
> > osd.2 up   in  weight 1 up_from 182 up_thru 182 down_at 178
> > last_clean_interval [123,177)
> >
> [v2:[2a00:da8:ffef:1433::]:6800/887631330,v1:[2a00:da8:ffef:1433::]:6801/887631330,v2:
> > 0.0.0.0:6802/887631330,v1:0.0.0.0:6803/887631330]
> >
> [v2:[2a00:da8:ffef:1433::]:6804/887631330,v1:[2a00:da8:ffef:1433::]:6805/887631330,v2:
> > 0.0.0.0:6806/887631330,v1:0.0.0.0:6807/887631330] exists,up
> > 21f8d0d5-6a3f-4f78-96c8-8ec4e4f78a01
>
> Hi Kamil,
>
> The issue is bogus 0.0.0.0 addresses.  This came up before, see [1] and
> later messages from Stefan in the thread.  You would need to ensure that
> ms_bind_ipv4 is set to false and restart OSDs.
>
> [1]
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/Q6VYRJBPHQI63OQTBJG2N3BJD2KBEZM4/
>
> Thanks,
>
>                 Ilya
>


-- 
Kamil Madac <https://kmadac.github.io/>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux