Re: [REEF][cephadm] new cluster all pg unknown

wodel youchi <wodel.youchi@xxxxxxxxx> · Fri, 15 Mar 2024 08:10:45 +0100

Hi,

I found my error, it was a mismatch between the monitor network ip address
and the --cluster_network which were in different subnets.
I misunderstood the --cluster_network subnet, I thought that when creating
a cluster, the monitor IP designed the public Network, and if I wanted to
separate public and private (cluster) networks, I needed to use the
--cluster_network option.
Maybe I was in over my head, but sometimes it is not that clear.

Regards.

Le ven. 15 mars 2024 à 07:18, wodel youchi <wodel.youchi@xxxxxxxxx> a
écrit :

> Hi,
>
> Note : Firewall is disabled on all hosts.
>
> Regards.
>
> Le ven. 15 mars 2024 à 06:42, wodel youchi <wodel.youchi@xxxxxxxxx> a
> écrit :
>
>> Hi,
>>
>> I did recreate the cluster again, and it is the result.
>>
>> This is my initial bootstrap
>>
>> cephadm --image 192.168.2.36:4000/ceph/ceph:v18 bootstrap
>> --initial-dashboard-user admin \
>> --initial-dashboard-password adminpass --dashboard-password-noupdate
>>  --registry-url 192.168.2.36:4000  \
>> --registry-username admin --registry-password admin --mon-ip 20.1.0.23
>> --cluster-network 20.2.0.0/16 \
>> --ssh-private-key /root/.ssh/id_rsa --ssh-public-key
>> /root/.ssh/id_rsa.pub \
>> -c initial-ceph.conf
>>
>> This is my initial-ceph.conf
>> [mgr]
>> mgr/cephadm/container_image_prometheus =
>> 192.168.2.36:4000/prometheus/prometheus:v2.43.0
>> mgr/cephadm/container_image_node_exporter =
>> 192.168.2.36:4000/prometheus/node-exporter:v1.5.0
>> mgr/cephadm/container_image_grafana =
>> 192.168.2.36:4000/ceph/ceph-grafana:9.4.7
>> mgr/cephadm/container_image_alertmanager =
>> 192.168.2.36:4000/prometheus/alertmanager:v0.25.0
>>
>>
>> Then I added two managers and monitors
>> # ceph orch host add controllerb 20.1.0.27 _admin
>> # ceph orch host add controllerc 20.1.0.31 _admin
>> # ceph orch apply mon --placement="3 controllera controllerb
>> controllerc"
>> # ceph orch apply mgr --placement="3 controllera controllerb controllerc"
>>
>> Then I added node-exporter, prometheus, grafana and crash
>> Then I added osd hosts
>> # ceph orch host add computehci01 20.1.0.2
>> # ceph orch host add computehci02 20.1.0.3
>> # ceph orch host add computehci03 20.1.0.4
>> ...
>> # ceph orch host add computehci09 20.1.0.10
>> ...
>>
>> And finally I added osd daemons
>> # ceph orch daemon add osd
>> computehci01:/dev/nvme0n1,/dev/nvme1n1,/dev/nvme2n1,/dev/nvme3n1
>> # ceph orch daemon add osd
>> computehci02:/dev/nvme0n1,/dev/nvme1n1,/dev/nvme2n1,/dev/nvme3n1
>> ...
>>
>> I created a pool
>> # ceph osd pool create volumes  replicated
>> # ceph osd pool application enable volumes rbd
>>
>> I even created cephfs pools and added mds service, but still 100% pgs are
>> unknown???? !!!!
>>
>>
>> [root@controllera ~]# ceph -s
>>  cluster:
>>    id:     df914aa2-e21a-11ee-b8df-3cecef2872f0
>>    health: HEALTH_WARN
>>            1 MDSs report slow metadata IOs
>>            Reduced data availability: 4 pgs inactive
>>
>>  services:
>>    mon: 3 daemons, quorum controllera,controllerc,controllerb (age 13h)
>>    mgr: controllera.ajttxz(active, since 13h), standbys:
>> controllerb.qtixeq, controllerc.pqyqqo
>>    mds: 1/1 daemons up, 2 standby
>>    osd: 36 osds: 36 up (since 7h), 36 in (since 7h)
>>
>>  data:
>>    volumes: 1/1 healthy
>>    pools:   4 pools, 4 pgs
>>    objects: 0 objects, 0 B
>>    usage:   1.1 GiB used, 110 TiB / 110 TiB avail
>>    pgs:     100.000% pgs unknown
>>             4 unknown
>>
>> [root@controllera ~]# ceph osd pool ls detail
>> pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 46 flags
>> hashpspool,creating stripe_width 0 pg_num_max 32 pg_nu
>> m_min 1 application mgr
>> pool 2 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 154 flags
>> hashpspool,creating stripe_width 0 application rbd
>> pool 3 'cephfs' replicated size 3 min_size 2 crush_rule 0 object_hash
>> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 157 flags
>> hashpspool,creating stripe_width 0 application ceph
>> fs
>> pool 4 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0
>> object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 157
>> flags hashpspool,creating stripe_width 0 pg_auto
>> scale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
>>
>> What am I missing, why PGs won't pair?
>>
>>
>>
>> Regards.
>>
>> Le jeu. 14 mars 2024 à 15:36, wodel youchi <wodel.youchi@xxxxxxxxx> a
>> écrit :
>>
>>> Hi,
>>>
>>> I am creating a new ceph cluster using REEF.
>>>
>>> This is my host_specs file
>>> [root@controllera config]# cat hosts-specs2.yml
>>> service_type: host
>>> hostname: computehci01
>>> addr: 20.1.0.2
>>> location:
>>>  chassis: chassis1
>>> ---
>>> service_type: host
>>> hostname: computehci02
>>> addr: 20.1.0.3
>>> location:
>>>  chassis: chassis1
>>> ---
>>> service_type: host
>>> hostname: computehci03
>>> addr: 20.1.0.4
>>> location:
>>>  chassis: chassis1
>>> ---
>>> service_type: host
>>> hostname: computehci04
>>> addr: 20.1.0.5
>>> location:
>>>  chassis: chassis2
>>> ---
>>> service_type: host
>>> hostname: computehci05
>>> addr: 20.1.0.6
>>> location:
>>>  chassis: chassis2
>>> ---
>>> service_type: host
>>> hostname: computehci06
>>> addr: 20.1.0.7
>>> location:
>>>  chassis: chassis2
>>> ---
>>> service_type: host
>>> hostname: computehci07
>>> addr: 20.1.0.8
>>> location:
>>>  chassis: chassis3
>>> ---
>>> service_type: host
>>> hostname: computehci08
>>> addr: 20.1.0.9
>>> location:
>>>  chassis: chassis3
>>> ---
>>> service_type: host
>>> hostname: computehci09
>>> addr: 20.1.0.10
>>> location:
>>>  chassis: chassis3
>>> ---
>>> service_type: host
>>> hostname: computehci10
>>> addr: 20.1.0.11
>>> location:
>>>  chassis: chassis3
>>> ---
>>> service_type: host
>>> hostname: computehci11
>>> addr: 20.1.0.12
>>> location:
>>>  chassis: chassis4
>>> ---
>>> service_type: host
>>> hostname: computehci12
>>> addr: 20.1.0.13
>>> location:
>>>  chassis: chassis4
>>> ---
>>> service_type: host
>>> hostname: computehci13
>>> addr: 20.1.0.14
>>> location:
>>>  chassis: chassis4
>>> ---
>>> service_type: host
>>> hostname: computehci14
>>> addr: 20.1.0.15
>>> location:
>>>  chassis: chassis4
>>> ---
>>> service_type: host
>>> hostname: computehci15
>>> addr: 20.1.0.16
>>> location:
>>>  chassis: chassis5
>>> ---
>>> service_type: host
>>> hostname: computehci16
>>> addr: 20.1.0.17
>>> location:
>>>  chassis: chassis5
>>> ---
>>> service_type: host
>>> hostname: computehci17
>>> addr: 20.1.0.18
>>> location:
>>>  chassis: chassis5
>>> ---
>>> service_type: host
>>> hostname: computehci18
>>> addr: 20.1.0.19
>>> location:
>>>  chassis: chassis5
>>> ---
>>> service_type: host
>>> hostname: computehci19
>>> addr: 20.1.0.20
>>> location:
>>>  chassis: chassis6
>>> ---
>>> service_type: host
>>> hostname: computehci20
>>> addr: 20.1.0.21
>>> location:
>>>  chassis: chassis6
>>> ---
>>> service_type: host
>>> hostname: computehci21
>>> addr: 20.1.0.22
>>> location:
>>>  chassis: chassis6
>>> ---
>>> service_type: host
>>> hostname: computehci22
>>> addr: 20.1.0.24
>>> location:
>>>  chassis: chassis7
>>> ---
>>> service_type: host
>>> hostname: computehci23
>>> addr: 20.1.0.25
>>> location:
>>>  chassis: chassis7
>>> ---
>>> service_type: host
>>> hostname: computehci24
>>> addr: 20.1.0.26
>>> location:
>>>  chassis: chassis7
>>> ---
>>> service_type: host
>>> hostname: computehci25
>>> addr: 20.1.0.28
>>> location:
>>>  chassis: chassis8
>>> ---
>>> service_type: host
>>> hostname: computehci26
>>> addr: 20.1.0.29
>>> location:
>>>  chassis: chassis8
>>> ---
>>> service_type: host
>>> hostname: computehci27
>>> addr: 20.1.0.30
>>> location:
>>>  chassis: chassis8
>>> ---
>>> service_type: host
>>> hostname: controllera
>>> addr: 20.1.0.23
>>> ---
>>> service_type: host
>>> hostname: controllerb
>>> addr: 20.1.0.27
>>> ---
>>> service_type: host
>>> hostname: controllerc
>>> addr: 20.1.0.31
>>> ---
>>> service_type: mon
>>> placement:
>>>  hosts:
>>>   - controllera
>>>   - controllerb
>>>   - controllerc
>>> ---
>>> service_type: mgr
>>> placement:
>>>  hosts:
>>>   - controllera
>>>   - controllerb
>>>   - controllerc
>>> ---
>>> service_type: osd
>>> service_id: default_drive_group
>>> placement:
>>>  hosts:
>>>   - computehci01
>>>   - computehci02
>>>   - computehci03
>>>   - computehci04
>>>   - computehci05
>>>   - computehci06
>>>   - computehci07
>>>   - computehci08
>>>   - computehci09
>>>   - computehci10
>>>   - computehci11
>>>   - computehci12
>>>   - computehci13
>>>   - computehci14
>>>   - computehci15
>>>   - computehci16
>>>   - computehci17
>>>   - computehci18
>>>   - computehci19
>>>   - computehci20
>>>   - computehci21
>>>   - computehci22
>>>   - computehci23
>>>   - computehci24
>>>   - computehci25
>>>   - computehci26
>>>   - computehci27
>>> spec:
>>>  data_devices:
>>>    rotational: 0
>>>
>>>
>>> All osds were added but, pg still unknown state
>>>
>>> I've created a pool, but it didn't change anything.
>>>
>>> [root@controllerb ~]# ceph -s
>>>  cluster:
>>>    id:     be250ade-e1f2-11ee-a6ff-3cecef2872f0
>>>    health: HEALTH_WARN
>>>            Reduced data availability: 1 pg inactive
>>>
>>>  services:
>>>    mon: 3 daemons, quorum controllera,controllerc,controllerb (age 3h)
>>>    mgr: controllerc.jevbkl(active, since 21s), standbys:
>>> controllera.zwlolp, controllerb.vqkdga
>>>    osd: 108 osds: 108 up (since 2m), 108 in (since 24m)
>>>
>>>  data:
>>>    pools:   2 pools, 33 pgs
>>>    objects: 0 objects, 0 B
>>>    usage:   5.1 GiB used, 330 TiB / 330 TiB avail
>>>
>>>
>>> *  pgs:     100.000% pgs unknown             33 unknown*
>>>
>>> Did I miss something?
>>>
>>> Regards.
>>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx