Re: Noob install: "rbd pool init" stuck

Renato Callado Borges <renato.callado@xxxxxxxxxxxx> · Wed, 19 Oct 2022 10:24:10 -0300

Hi Eugen!

How are you?

Thank you for your help!

# ceph osd tree

ID  CLASS  WEIGHT     TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-1         174.62640  root default
-3         174.62640      host darkside2
 0    hdd   14.55220          osd.0           up   1.00000  1.00000
 1    hdd   14.55220          osd.1           up   1.00000  1.00000
 2    hdd   14.55220          osd.2           up   1.00000  1.00000
 3    hdd   14.55220          osd.3           up   1.00000  1.00000
 4    hdd   14.55220          osd.4           up   1.00000  1.00000
 5    hdd   14.55220          osd.5           up   1.00000  1.00000
 6    hdd   14.55220          osd.6           up   1.00000  1.00000
 7    hdd   14.55220          osd.7           up   1.00000  1.00000
 8    hdd   14.55220          osd.8           up   1.00000  1.00000
 9    hdd   14.55220          osd.9           up   1.00000  1.00000
10    hdd   14.55220          osd.10          up   1.00000  1.00000
11    hdd   14.55220          osd.11          up   1.00000  1.00000

# ceph osd crush rule dump replicated_rule
{
    "rule_id": 0,
    "rule_name": "replicated_rule",
    "ruleset": 0,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}

I read about --all-available-devices, but I considered that there would 
be a slight chance it would add the system disks (two raid1 HDDs). So I 
went the route of adding manually the 'storage' HDDs. Regarding the 
yaml, it seemed overkill.

But perhaps you are mentioning this because --all-available-devices does 
some legwork invisibly? And it would be more sensible for me to 
backtrack everything and then run this automated command?

Cordially,
Renata.

On 18/10/2022 12:40, Eugen Block wrote:
Hi,

the command doesn't return because your PGs are inactive. It looks like 
you're trying to use the default replicated_rule but it can't find a 
suitable placement. What does your 'ceph osd tree' look like? And also 
paste your ruleset ('ceph osd crush rule dump replicated_rule').
Regarding OSD management you could have simply let cephadm choose all 
available disks for you [1]:

ceph orch device ls
ceph orch apply osd --all-available-devices

Or create a service spec yaml file and simply run 'ceph orch apply -i 
osd-specs.yaml' once to deploy all OSDs at once on all target nodes from 
that yaml file.

[1] https://docs.ceph.com/en/latest/cephadm/services/osd/#deploy-osds
[2] https://docs.ceph.com/en/latest/cephadm/services/osd/#examples

Zitat von Renato Callado Borges <renato.callado@xxxxxxxxxxxx>:

Dear all,

I am deploying a Ceph system for the first time.

I have 3 servers where I intend to install 1 manager, 1 mon and 12 
OSDs in each.

Since they are used in production already, I selected a single machine 
to begin deployment, but got stuck when creating rbd pools.

The host OS is Centos 7, and cephadm allowed me to install Octopus.

These are the commands I have issued so far:

./cephadm add-repo --release octopus
./cephadm install ceph-common
cephadm bootstrap --mon-ip "X.X.X.X" # edited for privacy, real IP used.
ceph orch daemon add osd darkside2:/dev/sdb

This latest add command was repeated 12 times, once for each block 
device to be added to Ceph storage.

ceph osd pool create lgcmUnsafe 128 128

Until here, everything seemed fine, no error messages on journalctl or 
on /var/log/ceph/cephadm.log. I have run ceph status after each 
command and the output seemed consistent.

This command, though, gets stuck forever, with no error or warning 
message anywhere:

rbd pool init lgcmUnsafe

I have canceled the command with ctrl+c and issued ceph status. This 
is the output:

  cluster:
    id:     1902a026-496d-11ed-b43e-08c0eb320ec2
    health: HEALTH_WARN
            Reduced data availability: 128 pgs inactive
            Degraded data redundancy: 128 pgs undersized

  services:
    mon: 1 daemons, quorum darkside2 (age 19h)
    mgr: darkside2.umccvh(active, since 19h)
    osd: 12 osds: 12 up (since 19h), 12 in (since 4d); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 13 objects, 0 B
    usage:   12 GiB used, 175 TiB / 175 TiB avail
    pgs:     99.225% pgs not active
             26/39 objects misplaced (66.667%)
             128 undersized+peered
             1   active+clean+remapped

Could someone more knowledgeable help me debug this, please? Thanks in 
advance!

Cordially,
Renata.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx