Re: ERROR: osd init failed: (1) Operation not permitted

ceph@xxxxxxxxxx · Mon, 22 Jun 2020 07:03:44 +0200

Hello Mario,

Did this resolved by ceph automaticaly (as i guess you are using 3x replication)?
What come in my mind, there is a PG Export command in ceph where you should perhaps be able to Export the said PG from your mounted osd and Import it to another osd so that ceph can get Copy...

Hth
Mehmet

Am 10. Februar 2020 19:42:18 MEZ schrieb Ml Ml <mliebherr99@xxxxxxxxxxxxxx>:
>Hello List,
>
>first of all: Yes - i made mistakes. Now i am trying to recover :-/
>
>I had a healthy 3 node cluster which i wanted to convert to a single
>one.
>My goal was to reinstall a fresh 3 Node cluster and start with 2 nodes.
>
>I was able to healthy turn it from a 3 Node Cluster to a 2 Node
>cluster.
>Then the problems began.
>
>I started to change size=1 and min_size=1.
>Health was okay until here. Then over sudden both nodes got
>fenced...one node refused to boot, mons where missing, etc...to make
>long story short, here is where i am right now:
>
>
>root@node03:~ # ceph -s
>    cluster b3be313f-d0ef-42d5-80c8-6b41380a47e3
>     health HEALTH_WARN
>            53 pgs stale
>            53 pgs stuck stale
>     monmap e4: 2 mons at {0=10.15.15.3:6789/0,1=10.15.15.2:6789/0}
>            election epoch 298, quorum 0,1 1,0
>     osdmap e6097: 14 osds: 9 up, 9 in
>      pgmap v93644673: 512 pgs, 1 pools, 1193 GB data, 304 kobjects
>            1088 GB used, 32277 GB / 33366 GB avail
>                 459 active+clean
>                  53 stale+active+clean
>
>root@node03:~ # ceph osd tree
>ID WEIGHT   TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
>-1 32.56990 root default
>-2 25.35992     host node03
> 0  3.57999         osd.0        up  1.00000          1.00000
> 5  3.62999         osd.5        up  1.00000          1.00000
> 6  3.62999         osd.6        up  1.00000          1.00000
> 7  3.62999         osd.7        up  1.00000          1.00000
> 8  3.62999         osd.8        up  1.00000          1.00000
>19  3.62999         osd.19       up  1.00000          1.00000
>20  3.62999         osd.20       up  1.00000          1.00000
>-3  7.20998     host node02
> 3  3.62999         osd.3        up  1.00000          1.00000
> 4  3.57999         osd.4        up  1.00000          1.00000
> 1        0 osd.1              down        0          1.00000
> 9        0 osd.9              down        0          1.00000
>10        0 osd.10             down        0          1.00000
>17        0 osd.17             down        0          1.00000
>18        0 osd.18             down        0          1.00000
>
>
>
>my main mistakes seemd to be:
>--------------------------------
>ceph osd out osd.1
>ceph auth del osd.1
>systemctl stop ceph-osd@1
>ceph osd rm 1
>umount /var/lib/ceph/osd/ceph-1
>ceph osd crush remove osd.1
>
>As far as i can tell, ceph waits and needs data from that OSD.1 (which
>i removed)
>
>
>
>root@node03:~ # ceph health detail
>HEALTH_WARN 53 pgs stale; 53 pgs stuck stale
>pg 0.1a6 is stuck stale for 5086.552795, current state
>stale+active+clean, last acting [1]
>pg 0.142 is stuck stale for 5086.552784, current state
>stale+active+clean, last acting [1]
>pg 0.1e is stuck stale for 5086.552820, current state
>stale+active+clean, last acting [1]
>pg 0.e0 is stuck stale for 5086.552855, current state
>stale+active+clean, last acting [1]
>pg 0.1d is stuck stale for 5086.552822, current state
>stale+active+clean, last acting [1]
>pg 0.13c is stuck stale for 5086.552791, current state
>stale+active+clean, last acting [1]
>[...] SNIP [...]
>pg 0.e9 is stuck stale for 5086.552955, current state
>stale+active+clean, last acting [1]
>pg 0.87 is stuck stale for 5086.552939, current state
>stale+active+clean, last acting [1]
>
>
>When i try to start ODS.1 manually, i get:
>--------------------------------------------
>2020-02-10 18:48:26.107444 7f9ce31dd880  0 ceph version 0.94.10
>(b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid
>10210
>2020-02-10 18:48:26.134417 7f9ce31dd880  0
>filestore(/var/lib/ceph/osd/ceph-1) backend xfs (magic 0x58465342)
>2020-02-10 18:48:26.184202 7f9ce31dd880  0
>genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
>FIEMAP ioctl is supported and appears to work
>2020-02-10 18:48:26.184209 7f9ce31dd880  0
>genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
>FIEMAP ioctl is disabled via 'filestore fiemap' config option
>2020-02-10 18:48:26.184526 7f9ce31dd880  0
>genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
>syncfs(2) syscall fully supported (by glibc and kernel)
>2020-02-10 18:48:26.184585 7f9ce31dd880  0
>xfsfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_feature: extsize
>is disabled by conf
>2020-02-10 18:48:26.309755 7f9ce31dd880  0
>filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal
>mode: checkpoint is not enabled
>2020-02-10 18:48:26.633926 7f9ce31dd880  1 journal _open
>/var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size
>4096 bytes, directio = 1, aio = 1
>2020-02-10 18:48:26.642185 7f9ce31dd880  1 journal _open
>/var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size
>4096 bytes, directio = 1, aio = 1
>2020-02-10 18:48:26.664273 7f9ce31dd880  0 <cls>
>cls/hello/cls_hello.cc:271: loading cls_hello
>2020-02-10 18:48:26.732154 7f9ce31dd880  0 osd.1 6002 crush map has
>features 1107558400, adjusting msgr requires for clients
>2020-02-10 18:48:26.732163 7f9ce31dd880  0 osd.1 6002 crush map has
>features 1107558400 was 8705, adjusting msgr requires for mons
>2020-02-10 18:48:26.732167 7f9ce31dd880  0 osd.1 6002 crush map has
>features 1107558400, adjusting msgr requires for osds
>2020-02-10 18:48:26.732179 7f9ce31dd880  0 osd.1 6002 load_pgs
>2020-02-10 18:48:31.939810 7f9ce31dd880  0 osd.1 6002 load_pgs opened
>53 pgs
>2020-02-10 18:48:31.940546 7f9ce31dd880 -1 osd.1 6002 log_to_monitors
>{default=true}
>2020-02-10 18:48:31.942471 7f9ce31dd880  1 journal close
>/var/lib/ceph/osd/ceph-1/journal
>2020-02-10 18:48:31.969205 7f9ce31dd880 -1 ESC[0;31m ** ERROR: osd
>init failed: (1) Operation not permittedESC[0m
>
>Its mounted:
>/dev/sdg1       3.7T  127G  3.6T   4% /var/lib/ceph/osd/ceph-1
>
>
>Is there any way i can get the OSD.1 back in?
>
>Thanks a lot,
>mario
>_______________________________________________
>ceph-users mailing list -- ceph-users@xxxxxxx
>To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx