Re: Noob install: "rbd pool init" stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eugen!


How are you?

Thank you for your help!

# ceph osd tree

ID  CLASS  WEIGHT     TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-1         174.62640  root default
-3         174.62640      host darkside2
 0    hdd   14.55220          osd.0           up   1.00000  1.00000
 1    hdd   14.55220          osd.1           up   1.00000  1.00000
 2    hdd   14.55220          osd.2           up   1.00000  1.00000
 3    hdd   14.55220          osd.3           up   1.00000  1.00000
 4    hdd   14.55220          osd.4           up   1.00000  1.00000
 5    hdd   14.55220          osd.5           up   1.00000  1.00000
 6    hdd   14.55220          osd.6           up   1.00000  1.00000
 7    hdd   14.55220          osd.7           up   1.00000  1.00000
 8    hdd   14.55220          osd.8           up   1.00000  1.00000
 9    hdd   14.55220          osd.9           up   1.00000  1.00000
10    hdd   14.55220          osd.10          up   1.00000  1.00000
11    hdd   14.55220          osd.11          up   1.00000  1.00000

# ceph osd crush rule dump replicated_rule
{
    "rule_id": 0,
    "rule_name": "replicated_rule",
    "ruleset": 0,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}

I read about --all-available-devices, but I considered that there would be a slight chance it would add the system disks (two raid1 HDDs). So I went the route of adding manually the 'storage' HDDs. Regarding the yaml, it seemed overkill.

But perhaps you are mentioning this because --all-available-devices does some legwork invisibly? And it would be more sensible for me to backtrack everything and then run this automated command?


Cordially,
Renata.

On 18/10/2022 12:40, Eugen Block wrote:
Hi,

the command doesn't return because your PGs are inactive. It looks like you're trying to use the default replicated_rule but it can't find a suitable placement. What does your 'ceph osd tree' look like? And also paste your ruleset ('ceph osd crush rule dump replicated_rule'). Regarding OSD management you could have simply let cephadm choose all available disks for you [1]:

ceph orch device ls
ceph orch apply osd --all-available-devices

Or create a service spec yaml file and simply run 'ceph orch apply -i osd-specs.yaml' once to deploy all OSDs at once on all target nodes from that yaml file.

[1] https://docs.ceph.com/en/latest/cephadm/services/osd/#deploy-osds
[2] https://docs.ceph.com/en/latest/cephadm/services/osd/#examples

Zitat von Renato Callado Borges <renato.callado@xxxxxxxxxxxx>:

Dear all,


I am deploying a Ceph system for the first time.

I have 3 servers where I intend to install 1 manager, 1 mon and 12 OSDs in each.

Since they are used in production already, I selected a single machine to begin deployment, but got stuck when creating rbd pools.

The host OS is Centos 7, and cephadm allowed me to install Octopus.

These are the commands I have issued so far:

./cephadm add-repo --release octopus
./cephadm install ceph-common
cephadm bootstrap --mon-ip "X.X.X.X" # edited for privacy, real IP used.
ceph orch daemon add osd darkside2:/dev/sdb

This latest add command was repeated 12 times, once for each block device to be added to Ceph storage.

ceph osd pool create lgcmUnsafe 128 128

Until here, everything seemed fine, no error messages on journalctl or on /var/log/ceph/cephadm.log. I have run ceph status after each command and the output seemed consistent.

This command, though, gets stuck forever, with no error or warning message anywhere:

rbd pool init lgcmUnsafe

I have canceled the command with ctrl+c and issued ceph status. This is the output:

  cluster:
    id:     1902a026-496d-11ed-b43e-08c0eb320ec2
    health: HEALTH_WARN
            Reduced data availability: 128 pgs inactive
            Degraded data redundancy: 128 pgs undersized

  services:
    mon: 1 daemons, quorum darkside2 (age 19h)
    mgr: darkside2.umccvh(active, since 19h)
    osd: 12 osds: 12 up (since 19h), 12 in (since 4d); 1 remapped pgs

  data:
    pools:   2 pools, 129 pgs
    objects: 13 objects, 0 B
    usage:   12 GiB used, 175 TiB / 175 TiB avail
    pgs:     99.225% pgs not active
             26/39 objects misplaced (66.667%)
             128 undersized+peered
             1   active+clean+remapped



Could someone more knowledgeable help me debug this, please? Thanks in advance!


Cordially,
Renata.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux