Re: mons fail as soon as I attempt to mount

Jeremy Hansen <jeremy@xxxxxxxxxx> · Mon, 15 Nov 2021 02:32:27 -0800

This post references a kernel issue:

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/6JDAPD5IR46JI6R6YGWQORDJTZ5Z2FIU/

I recently updated my ceph nodes to 5.15.1. Could this be my issue?

-jeremy

> On Sunday, Nov 14, 2021 at 4:37 PM, Jeremy Hansen <jeremy@xxxxxxxxxx (mailto:jeremy@xxxxxxxxxx)> wrote:
> I’m trying to mount a cephfs volume from a new machine. For some reason, it looks like all the mons fail when attempting to mount:
>
> [root@btc04 ~]# mount -t ceph :/ /mnt/ceph -o name=btc
> mount error: no mds server is up or the cluster is laggy
> [root@btc04 ~]# rpm -qa | grep ceph
> python3-cephfs-16.2.4-0.el8.x86_64
> ceph-common-16.2.4-0.el8.x86_64
> cephadm-16.2.4-0.el8.noarch
> python3-ceph-argparse-16.2.4-0.el8.x86_64
> libcephfs2-16.2.4-0.el8.x86_64
> python3-ceph-common-16.2.4-0.el8.x86_64
>
>
> [ 51.105212] libceph: loaded (mon/osd proto 15/24)
> [ 51.145564] ceph: loaded (mds proto 32)
> [ 51.164266] libceph: mon3 (1)192.168.30.14:6789 session established
> [ 70.199453] libceph: mon3 (1)192.168.30.14:6789 socket closed (con state OPEN)
> [ 70.199464] libceph: mon3 (1)192.168.30.14:6789 session lost, hunting for new mon
> [ 70.204400] libceph: mon0 (1)192.168.30.11:6789 session established
> [ 70.771652] libceph: mon0 (1)192.168.30.11:6789 socket closed (con state OPEN)
> [ 70.771670] libceph: mon0 (1)192.168.30.11:6789 session lost, hunting for new mon
> [ 70.774588] libceph: mon4 (1)192.168.30.15:6789 session established
> [ 71.234037] libceph: mon4 (1)192.168.30.15:6789 socket closed (con state OPEN)
> [ 71.234055] libceph: mon4 (1)192.168.30.15:6789 session lost, hunting for new mon
> [ 77.904722] libceph: mon3 (1)192.168.30.14:6789 socket closed (con state V1_BANNER)
> [ 78.160614] libceph: mon3 (1)192.168.30.14:6789 socket closed (con state V1_BANNER)
> [ 78.664602] libceph: mon3 (1)192.168.30.14:6789 socket closed (con state V1_BANNER)
> [ 79.824787] libceph: mon3 (1)192.168.30.14:6789 socket closed (con state V1_BANNER)
> [ 81.808526] libceph: mon3 (1)192.168.30.14:6789 socket closed (con state V1_BANNER)
> [ 85.840430] libceph: mon3 (1)192.168.30.14:6789 socket closed (con state V1_BANNER)
>
>
> Not really sure why…
>
> [ceph: root@cn01 /]# ceph osd pool get cephfs.btc.data all
> size: 4
> min_size: 2
> pg_num: 32
> pgp_num: 32
> crush_rule: replicated_rule
> hashpspool: true
> nodelete: false
> nopgchange: false
> nosizechange: false
> write_fadvise_dontneed: false
> noscrub: false
> nodeep-scrub: false
> use_gmt_hitset: 1
> fast_read: 0
> pg_autoscale_mode: on
> [ceph: root@cn01 /]# ceph osd pool get cephfs.btc.meta all
> size: 4
> min_size: 2
> pg_num: 32
> pgp_num: 32
> crush_rule: replicated_rule
> hashpspool: true
> nodelete: false
> nopgchange: false
> nosizechange: false
> write_fadvise_dontneed: false
> noscrub: false
> nodeep-scrub: false
> use_gmt_hitset: 1
> fast_read: 0
> recovery_priority: 5
> pg_autoscale_mode: on
> pg_num_min: 16
> pg_autoscale_bias: 4
>
>
> The cluster becomes unhealthy but then clears shortly after the client times out.
>
> [ceph: root@cn01 /]# ceph -s
> cluster:
> id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d
> health: HEALTH_OK
>
> services:
> mon: 5 daemons, quorum cn05,cn02,cn03,cn04,cn01 (age 6m)
> mgr: cn05.vpuwau(active, since 6d), standbys: cn02.arszct
> mds: 2/2 daemons up, 4 standby
> osd: 35 osds: 35 up (since 2d), 35 in (since 6d)
>
> data:
> volumes: 2/2 healthy
> pools: 6 pools, 289 pgs
> objects: 6.73M objects, 4.3 TiB
> usage: 17 TiB used, 108 TiB / 126 TiB avail
> pgs: 289 active+clean
>
> io:
> client: 0 B/s rd, 105 KiB/s wr, 2 op/s rd, 13 op/s wr
>
>
>
> -jeremy
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx