Hello List, first of all: Yes - i made mistakes. Now i am trying to recover :-/ I had a healthy 3 node cluster which i wanted to convert to a single one. My goal was to reinstall a fresh 3 Node cluster and start with 2 nodes. I was able to healthy turn it from a 3 Node Cluster to a 2 Node cluster. Then the problems began. I started to change size=1 and min_size=1. Health was okay until here. Then over sudden both nodes got fenced...one node refused to boot, mons where missing, etc...to make long story short, here is where i am right now: root@node03:~ # ceph -s cluster b3be313f-d0ef-42d5-80c8-6b41380a47e3 health HEALTH_WARN 53 pgs stale 53 pgs stuck stale monmap e4: 2 mons at {0=10.15.15.3:6789/0,1=10.15.15.2:6789/0} election epoch 298, quorum 0,1 1,0 osdmap e6097: 14 osds: 9 up, 9 in pgmap v93644673: 512 pgs, 1 pools, 1193 GB data, 304 kobjects 1088 GB used, 32277 GB / 33366 GB avail 459 active+clean 53 stale+active+clean root@node03:~ # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 32.56990 root default -2 25.35992 host node03 0 3.57999 osd.0 up 1.00000 1.00000 5 3.62999 osd.5 up 1.00000 1.00000 6 3.62999 osd.6 up 1.00000 1.00000 7 3.62999 osd.7 up 1.00000 1.00000 8 3.62999 osd.8 up 1.00000 1.00000 19 3.62999 osd.19 up 1.00000 1.00000 20 3.62999 osd.20 up 1.00000 1.00000 -3 7.20998 host node02 3 3.62999 osd.3 up 1.00000 1.00000 4 3.57999 osd.4 up 1.00000 1.00000 1 0 osd.1 down 0 1.00000 9 0 osd.9 down 0 1.00000 10 0 osd.10 down 0 1.00000 17 0 osd.17 down 0 1.00000 18 0 osd.18 down 0 1.00000 my main mistakes seemd to be: -------------------------------- ceph osd out osd.1 ceph auth del osd.1 systemctl stop ceph-osd@1 ceph osd rm 1 umount /var/lib/ceph/osd/ceph-1 ceph osd crush remove osd.1 As far as i can tell, ceph waits and needs data from that OSD.1 (which i removed) root@node03:~ # ceph health detail HEALTH_WARN 53 pgs stale; 53 pgs stuck stale pg 0.1a6 is stuck stale for 5086.552795, current state stale+active+clean, last acting [1] pg 0.142 is stuck stale for 5086.552784, current state stale+active+clean, last acting [1] pg 0.1e is stuck stale for 5086.552820, current state stale+active+clean, last acting [1] pg 0.e0 is stuck stale for 5086.552855, current state stale+active+clean, last acting [1] pg 0.1d is stuck stale for 5086.552822, current state stale+active+clean, last acting [1] pg 0.13c is stuck stale for 5086.552791, current state stale+active+clean, last acting [1] [...] SNIP [...] pg 0.e9 is stuck stale for 5086.552955, current state stale+active+clean, last acting [1] pg 0.87 is stuck stale for 5086.552939, current state stale+active+clean, last acting [1] When i try to start ODS.1 manually, i get: -------------------------------------------- 2020-02-10 18:48:26.107444 7f9ce31dd880 0 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 10210 2020-02-10 18:48:26.134417 7f9ce31dd880 0 filestore(/var/lib/ceph/osd/ceph-1) backend xfs (magic 0x58465342) 2020-02-10 18:48:26.184202 7f9ce31dd880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is supported and appears to work 2020-02-10 18:48:26.184209 7f9ce31dd880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2020-02-10 18:48:26.184526 7f9ce31dd880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2020-02-10 18:48:26.184585 7f9ce31dd880 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_feature: extsize is disabled by conf 2020-02-10 18:48:26.309755 7f9ce31dd880 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2020-02-10 18:48:26.633926 7f9ce31dd880 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2020-02-10 18:48:26.642185 7f9ce31dd880 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2020-02-10 18:48:26.664273 7f9ce31dd880 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello 2020-02-10 18:48:26.732154 7f9ce31dd880 0 osd.1 6002 crush map has features 1107558400, adjusting msgr requires for clients 2020-02-10 18:48:26.732163 7f9ce31dd880 0 osd.1 6002 crush map has features 1107558400 was 8705, adjusting msgr requires for mons 2020-02-10 18:48:26.732167 7f9ce31dd880 0 osd.1 6002 crush map has features 1107558400, adjusting msgr requires for osds 2020-02-10 18:48:26.732179 7f9ce31dd880 0 osd.1 6002 load_pgs 2020-02-10 18:48:31.939810 7f9ce31dd880 0 osd.1 6002 load_pgs opened 53 pgs 2020-02-10 18:48:31.940546 7f9ce31dd880 -1 osd.1 6002 log_to_monitors {default=true} 2020-02-10 18:48:31.942471 7f9ce31dd880 1 journal close /var/lib/ceph/osd/ceph-1/journal 2020-02-10 18:48:31.969205 7f9ce31dd880 -1 ESC[0;31m ** ERROR: osd init failed: (1) Operation not permittedESC[0m Its mounted: /dev/sdg1 3.7T 127G 3.6T 4% /var/lib/ceph/osd/ceph-1 Is there any way i can get the OSD.1 back in? Thanks a lot, mario _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx