all PG remapped after osd server reinstallation (Pacific)

Patrick Vranckx <patrick.vranckx@xxxxxxxxxxxx> · Wed, 31 Aug 2022 16:26:35 +0200

Hi,

I use a Ceph test infrastructure with only two storage servers running 
the OSDs. Objects are replicated between these servers:

[ceph: root@cepht001 /]# ceph osd dump | grep 'replicated size'
pool 1 '.rgw.root' replicated size 2 min_size 1 crush_rule 0 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 237 flags 
hashpspool stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 2 min_size 1 crush_rule 0 
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn 
last_change 239 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.meta' replicated size 2 min_size 1 crush_rule 0 
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn 
last_change 243 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.log' replicated size 2 min_size 1 crush_rule 0 
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn 
last_change 244 flags hashpspool stripe_width 0 application rgw
pool 6 'rbd_dup' replicated size 2 min_size 1 crush_rule 0 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 975 lfor 
0/975/973 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 7 'cephfs_metadata' replicated size 2 min_size 1 crush_rule 0 
object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 
1121 lfor 0/1121/1119 flags hashpspool stripe_width 0 pg_autoscale_bias 
4 pg_num_min 16 recovery_priority 5 application cephfs
pool 8 'cephfs_data' replicated size 2 min_size 1 crush_rule 0 
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 
1005 lfor 0/1005/1003 flags hashpspool stripe_width 0 application cephfs
pool 9 'device_health_metrics' replicated size 2 min_size 1 crush_rule 0 
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 
11476 flags hashpspool stripe_width 0 pg_num_min 1 application 
mgr_devicehealth

[ceph: root@cepht001 /]# ceph osd crush rule dump
[
    {
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

Ceph version is 16.2.9 (Pacific)

I reinstalled one storage server (same Ceph version). I followed the 
following path

- set noout flag

- stop all osd on this server

- back up all osd definition in /var/lib/ceph/<fsid>/osd.X

- back up all symplink related to OSD in 
/etc/systemd/system/ceph-<fsid>.taget.wants

- reinstall OS

- reinstall cephadm, keyring, ...

- displace a monitor to this server in order to recreate the 
/var/lib/ceph/<fsid> and /etc/systemd/system/ceph-<fsid>.target.wants tree

- restore osd definition and osd symlink services

- systemctl daemon-reload

- systemctl restart ceph-<fsid>.target

(as an alternative, I restored the OSD definitions in 
/var/lib/ceph/<fsid> and did a redeploy of each OSD to recreate the 
symlinks in systemd)

All daemons are seen running by the orchestrator, health of the cluster 
is OK but all the pgs are remapped and half of the objects are 
missplaced... As if the restored OSDs are seen as a new and different 
group of OSDs.

[ceph: root@cepht001 /]# ceph -s
  cluster:
    id:     1f0f76fa-7d62-43b9-b9d2-ee87da10fc32
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum cepht001,cephtstor01,cephtstor02 (age 116m)
    mgr: cepht002.bxlxvc(active, since 18m), standbys: cepht003.ldxygn, 
cepht001.ljtuai
    mds: 1/1 daemons up, 2 standby
    osd: 41 osds: 41 up (since 2h), 41 in (since 2h); 209 remapped pgs
    rgw: 3 daemons active (3 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   8 pools, 209 pgs
    objects: 3.36k objects, 12 GiB
    usage:   29 GiB used, 104 TiB / 104 TiB avail
    pgs:     3361/6722 objects misplaced (50.000%)
             209 active+clean+remapped

How can I recover from this situation ?

Is there a better way to achieve OS reinstallation than the steps I 
followed ?

Thanks for your help,

Patrick

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx