Hi, I found my error, it was a mismatch between the monitor network ip address and the --cluster_network which were in different subnets. I misunderstood the --cluster_network subnet, I thought that when creating a cluster, the monitor IP designed the public Network, and if I wanted to separate public and private (cluster) networks, I needed to use the --cluster_network option. Maybe I was in over my head, but sometimes it is not that clear. Regards. Le ven. 15 mars 2024 à 07:18, wodel youchi <wodel.youchi@xxxxxxxxx> a écrit : > Hi, > > Note : Firewall is disabled on all hosts. > > Regards. > > Le ven. 15 mars 2024 à 06:42, wodel youchi <wodel.youchi@xxxxxxxxx> a > écrit : > >> Hi, >> >> I did recreate the cluster again, and it is the result. >> >> This is my initial bootstrap >> >> cephadm --image 192.168.2.36:4000/ceph/ceph:v18 bootstrap >> --initial-dashboard-user admin \ >> --initial-dashboard-password adminpass --dashboard-password-noupdate >> --registry-url 192.168.2.36:4000 \ >> --registry-username admin --registry-password admin --mon-ip 20.1.0.23 >> --cluster-network 20.2.0.0/16 \ >> --ssh-private-key /root/.ssh/id_rsa --ssh-public-key >> /root/.ssh/id_rsa.pub \ >> -c initial-ceph.conf >> >> This is my initial-ceph.conf >> [mgr] >> mgr/cephadm/container_image_prometheus = >> 192.168.2.36:4000/prometheus/prometheus:v2.43.0 >> mgr/cephadm/container_image_node_exporter = >> 192.168.2.36:4000/prometheus/node-exporter:v1.5.0 >> mgr/cephadm/container_image_grafana = >> 192.168.2.36:4000/ceph/ceph-grafana:9.4.7 >> mgr/cephadm/container_image_alertmanager = >> 192.168.2.36:4000/prometheus/alertmanager:v0.25.0 >> >> >> Then I added two managers and monitors >> # ceph orch host add controllerb 20.1.0.27 _admin >> # ceph orch host add controllerc 20.1.0.31 _admin >> # ceph orch apply mon --placement="3 controllera controllerb >> controllerc" >> # ceph orch apply mgr --placement="3 controllera controllerb controllerc" >> >> Then I added node-exporter, prometheus, grafana and crash >> Then I added osd hosts >> # ceph orch host add computehci01 20.1.0.2 >> # ceph orch host add computehci02 20.1.0.3 >> # ceph orch host add computehci03 20.1.0.4 >> ... >> # ceph orch host add computehci09 20.1.0.10 >> ... >> >> And finally I added osd daemons >> # ceph orch daemon add osd >> computehci01:/dev/nvme0n1,/dev/nvme1n1,/dev/nvme2n1,/dev/nvme3n1 >> # ceph orch daemon add osd >> computehci02:/dev/nvme0n1,/dev/nvme1n1,/dev/nvme2n1,/dev/nvme3n1 >> ... >> >> I created a pool >> # ceph osd pool create volumes replicated >> # ceph osd pool application enable volumes rbd >> >> I even created cephfs pools and added mds service, but still 100% pgs are >> unknown???? !!!! >> >> >> [root@controllera ~]# ceph -s >> cluster: >> id: df914aa2-e21a-11ee-b8df-3cecef2872f0 >> health: HEALTH_WARN >> 1 MDSs report slow metadata IOs >> Reduced data availability: 4 pgs inactive >> >> services: >> mon: 3 daemons, quorum controllera,controllerc,controllerb (age 13h) >> mgr: controllera.ajttxz(active, since 13h), standbys: >> controllerb.qtixeq, controllerc.pqyqqo >> mds: 1/1 daemons up, 2 standby >> osd: 36 osds: 36 up (since 7h), 36 in (since 7h) >> >> data: >> volumes: 1/1 healthy >> pools: 4 pools, 4 pgs >> objects: 0 objects, 0 B >> usage: 1.1 GiB used, 110 TiB / 110 TiB avail >> pgs: 100.000% pgs unknown >> 4 unknown >> >> [root@controllera ~]# ceph osd pool ls detail >> pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 46 flags >> hashpspool,creating stripe_width 0 pg_num_max 32 pg_nu >> m_min 1 application mgr >> pool 2 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 154 flags >> hashpspool,creating stripe_width 0 application rbd >> pool 3 'cephfs' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 157 flags >> hashpspool,creating stripe_width 0 application ceph >> fs >> pool 4 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 >> object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 157 >> flags hashpspool,creating stripe_width 0 pg_auto >> scale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs >> >> What am I missing, why PGs won't pair? >> >> >> >> Regards. >> >> Le jeu. 14 mars 2024 à 15:36, wodel youchi <wodel.youchi@xxxxxxxxx> a >> écrit : >> >>> Hi, >>> >>> I am creating a new ceph cluster using REEF. >>> >>> This is my host_specs file >>> [root@controllera config]# cat hosts-specs2.yml >>> service_type: host >>> hostname: computehci01 >>> addr: 20.1.0.2 >>> location: >>> chassis: chassis1 >>> --- >>> service_type: host >>> hostname: computehci02 >>> addr: 20.1.0.3 >>> location: >>> chassis: chassis1 >>> --- >>> service_type: host >>> hostname: computehci03 >>> addr: 20.1.0.4 >>> location: >>> chassis: chassis1 >>> --- >>> service_type: host >>> hostname: computehci04 >>> addr: 20.1.0.5 >>> location: >>> chassis: chassis2 >>> --- >>> service_type: host >>> hostname: computehci05 >>> addr: 20.1.0.6 >>> location: >>> chassis: chassis2 >>> --- >>> service_type: host >>> hostname: computehci06 >>> addr: 20.1.0.7 >>> location: >>> chassis: chassis2 >>> --- >>> service_type: host >>> hostname: computehci07 >>> addr: 20.1.0.8 >>> location: >>> chassis: chassis3 >>> --- >>> service_type: host >>> hostname: computehci08 >>> addr: 20.1.0.9 >>> location: >>> chassis: chassis3 >>> --- >>> service_type: host >>> hostname: computehci09 >>> addr: 20.1.0.10 >>> location: >>> chassis: chassis3 >>> --- >>> service_type: host >>> hostname: computehci10 >>> addr: 20.1.0.11 >>> location: >>> chassis: chassis3 >>> --- >>> service_type: host >>> hostname: computehci11 >>> addr: 20.1.0.12 >>> location: >>> chassis: chassis4 >>> --- >>> service_type: host >>> hostname: computehci12 >>> addr: 20.1.0.13 >>> location: >>> chassis: chassis4 >>> --- >>> service_type: host >>> hostname: computehci13 >>> addr: 20.1.0.14 >>> location: >>> chassis: chassis4 >>> --- >>> service_type: host >>> hostname: computehci14 >>> addr: 20.1.0.15 >>> location: >>> chassis: chassis4 >>> --- >>> service_type: host >>> hostname: computehci15 >>> addr: 20.1.0.16 >>> location: >>> chassis: chassis5 >>> --- >>> service_type: host >>> hostname: computehci16 >>> addr: 20.1.0.17 >>> location: >>> chassis: chassis5 >>> --- >>> service_type: host >>> hostname: computehci17 >>> addr: 20.1.0.18 >>> location: >>> chassis: chassis5 >>> --- >>> service_type: host >>> hostname: computehci18 >>> addr: 20.1.0.19 >>> location: >>> chassis: chassis5 >>> --- >>> service_type: host >>> hostname: computehci19 >>> addr: 20.1.0.20 >>> location: >>> chassis: chassis6 >>> --- >>> service_type: host >>> hostname: computehci20 >>> addr: 20.1.0.21 >>> location: >>> chassis: chassis6 >>> --- >>> service_type: host >>> hostname: computehci21 >>> addr: 20.1.0.22 >>> location: >>> chassis: chassis6 >>> --- >>> service_type: host >>> hostname: computehci22 >>> addr: 20.1.0.24 >>> location: >>> chassis: chassis7 >>> --- >>> service_type: host >>> hostname: computehci23 >>> addr: 20.1.0.25 >>> location: >>> chassis: chassis7 >>> --- >>> service_type: host >>> hostname: computehci24 >>> addr: 20.1.0.26 >>> location: >>> chassis: chassis7 >>> --- >>> service_type: host >>> hostname: computehci25 >>> addr: 20.1.0.28 >>> location: >>> chassis: chassis8 >>> --- >>> service_type: host >>> hostname: computehci26 >>> addr: 20.1.0.29 >>> location: >>> chassis: chassis8 >>> --- >>> service_type: host >>> hostname: computehci27 >>> addr: 20.1.0.30 >>> location: >>> chassis: chassis8 >>> --- >>> service_type: host >>> hostname: controllera >>> addr: 20.1.0.23 >>> --- >>> service_type: host >>> hostname: controllerb >>> addr: 20.1.0.27 >>> --- >>> service_type: host >>> hostname: controllerc >>> addr: 20.1.0.31 >>> --- >>> service_type: mon >>> placement: >>> hosts: >>> - controllera >>> - controllerb >>> - controllerc >>> --- >>> service_type: mgr >>> placement: >>> hosts: >>> - controllera >>> - controllerb >>> - controllerc >>> --- >>> service_type: osd >>> service_id: default_drive_group >>> placement: >>> hosts: >>> - computehci01 >>> - computehci02 >>> - computehci03 >>> - computehci04 >>> - computehci05 >>> - computehci06 >>> - computehci07 >>> - computehci08 >>> - computehci09 >>> - computehci10 >>> - computehci11 >>> - computehci12 >>> - computehci13 >>> - computehci14 >>> - computehci15 >>> - computehci16 >>> - computehci17 >>> - computehci18 >>> - computehci19 >>> - computehci20 >>> - computehci21 >>> - computehci22 >>> - computehci23 >>> - computehci24 >>> - computehci25 >>> - computehci26 >>> - computehci27 >>> spec: >>> data_devices: >>> rotational: 0 >>> >>> >>> All osds were added but, pg still unknown state >>> >>> I've created a pool, but it didn't change anything. >>> >>> [root@controllerb ~]# ceph -s >>> cluster: >>> id: be250ade-e1f2-11ee-a6ff-3cecef2872f0 >>> health: HEALTH_WARN >>> Reduced data availability: 1 pg inactive >>> >>> services: >>> mon: 3 daemons, quorum controllera,controllerc,controllerb (age 3h) >>> mgr: controllerc.jevbkl(active, since 21s), standbys: >>> controllera.zwlolp, controllerb.vqkdga >>> osd: 108 osds: 108 up (since 2m), 108 in (since 24m) >>> >>> data: >>> pools: 2 pools, 33 pgs >>> objects: 0 objects, 0 B >>> usage: 5.1 GiB used, 330 TiB / 330 TiB avail >>> >>> >>> * pgs: 100.000% pgs unknown 33 unknown* >>> >>> Did I miss something? >>> >>> Regards. >>> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx