Hi, One simple/quick question. In my ceph cluster, I had a disk wich was in predicted failure. It was so much in predicted failure that the ceph OSD daemon crashed. After the OSD crashed, ceph moved data correctly (or at least that’s what I thought), and a ceph –s was giving a “HEALTH_OK”. Perfect. I tride to tell ceph to mark the OSD down : it told me the OSD was already down… fine. Then I ran this : ID=43 ; ceph osd down $ID ; ceph auth del osd.$ID ; ceph osd rm $ID ; ceph osd crush remove osd.$ID And immediately after this, ceph told me : # ceph -s cluster 70ac4a78-46c0-45e6-8ff9-878b37f50fa1 health HEALTH_WARN 37 pgs backfilling 3 pgs stuck unclean recovery 12086/355688 objects misplaced (3.398%) monmap e2: 3 mons at {ceph0=192.54.207.70:6789/0,ceph1=192.54.207.71:6789/0,ceph2=192.54.207.72:6789/0} election epoch 938, quorum 0,1,2 ceph0,ceph1,ceph2 mdsmap e64: 1/1/1 up {0=ceph1=up:active}, 1 up:standby-replay, 1 up:standby osdmap e25455: 119 osds: 119 up, 119 in; 35 remapped pgs pgmap v5473702: 3212 pgs, 10 pools, 378 GB data, 97528 objects 611 GB used, 206 TB / 207 TB avail 12086/355688 objects misplaced (3.398%) 3175 active+clean 37 active+remapped+backfilling client io 192 kB/s rd, 1352 kB/s wr, 117 op/s Off course, I’m sure the OSD 43 was the one that was down ;) My question therefore is : If ceph successfully and automatically migrated data off the down/out OSD, why is there even anything happening once I tell ceph to forget about this osd ? Was the cluster not “HEALTH OK” after all ? (ceph-0.94.6-0.el7.x86_64 for now) Thanks && regards |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com