I have test one-node ceph cluster with 4 osds under assumption to add the second node just before production. Linux 4.19.0-6-amd64 - debian 10 - ceph version 12.2.11 Unfortunately, system drive was broken before it. I recovered the system from full backup. Since no changes was performed to the cluster configuration after that backup, I hoped that it works. For the reasons I can't understand first few seconds after boot ceph status was OK (134 active+clean, 2 active+clean+scrubbing+deep), but a minute later status changed to: # ceph status cluster: id: e02f2885-946b-46c8-91d5-146dd724ecaf health: HEALTH_WARN 1 filesystem is degraded 2 osds down 1 slice (2 osds) down Reduced data availability: 136 pgs inactive, 15 pgs peering services: mon: 1 daemons, quorum rbd0 mgr: rbd0(active) mds: fs-1/1/1 up {0=rbd0=up:replay} osd: 5 osds: 1 up, 3 in data: pools: 2 pools, 136 pgs objects: 118.53k objects, 429GiB usage: 7.15TiB used, 3.77TiB / 10.9TiB avail pgs: 88.971% pgs unknown 11.029% pgs not active 121 unknown 15 peering # ceph osd dump epoch 1983 fsid e02f2885-946b-46c8-91d5-146dd724ecaf created 2019-08-16 15:14:07.783009 modified 2020-02-29 13:55:39.212461 flags sortbitwise,recovery_deletes,purged_snapdirs crush_version 27 full_ratio 0.97 backfillfull_ratio 0.94 nearfull_ratio 0.85 require_min_compat_client jewel min_compat_client jewel require_osd_release luminous pool 1 'fs_data' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 1595 flags hashpspool stripe_width 0 application cephfs pool 2 'fs_meta' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1595 flags hashpspool stripe_width 0 application cephfs max_osd 5 osd.0 down out weight 0 up_from 1970 up_thru 1973 down_at 1975 last_clean_interval [1949,1963) 192.168.101.111:6806/440 192.168.101.111:6807/440 192.168.101.111:6808/440 192.168.101.111:6809/440 autoout,exists 78eaeb63-47c9-4962-b8ff-46607921f4f6 osd.1 down in weight 1 up_from 1970 up_thru 1970 down_at 1975 last_clean_interval [1952,1963) 192.168.101.111:6801/439 192.168.101.111:6810/439 192.168.101.111:6811/439 192.168.101.111:6812/439 exists c4c4c85d-f537-4199-823b-b7ab01c78f03 osd.2 down in weight 1 up_from 1969 up_thru 1975 down_at 1976 last_clean_interval [1946,1963) 192.168.101.111:6802/441 192.168.101.111:6803/441 192.168.101.111:6804/441 192.168.101.111:6805/441 exists bd66a9c3-bfa4-4352-816e-2e4cd86389f3 osd.3 down out weight 0 up_from 1617 up_thru 1619 down_at 1631 last_clean_interval [1602,1610) 192.168.101.111:6805/933 192.168.101.111:6806/933 192.168.101.111:6807/933 192.168.101.111:6808/933 exists f247115b-c6d5-49b1-9b0e-e799c50be379 osd.4 up in weight 1 up_from 1973 up_thru 1973 down_at 1972 last_clean_interval [1956,1963) 192.168.101.111:6813/442 192.168.101.111:6814/442 192.168.101.111:6815/442 192.168.101.111:6816/442 exists,up c208221e-1228-4247-a742-0c16ce01d38f blacklist 192.168.101.111:6800/2636437603 expires 2020-03-01 13:26:01.809132 "ceph pg query" of any PG didn't response. I can't find any errors in journalctl or in /var/log/ceph/* I wonder why only osd 4 up, what means outoout, why 15 pgs are peering, where to search detail information, is it a way to restore data. Please help me to understand what happend and how to restore data if it possible. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx