I tried restarting all the osd's on that node, osd.70 was the only ceph process that did not come back online. There is nothing in the ceph-osd log for osd.70. However I do see over 13,000 of these messages in the kern.log: Nov 6 19:54:27 hqosd6 kernel: [34042786.392178] XFS (sdl1): xfs_log_force: error 5 returned. Does anyone have any suggestions on how I might be able to get this HD back in the cluster (or whether or not it is worth even trying). Thanks, Shain Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649 ________________________________________ From: Shain Miley [smiley@xxxxxxx] Sent: Tuesday, November 04, 2014 3:55 PM To: ceph-users@xxxxxxxxxxxxxx Subject: osd down Hello, We are running ceph version 0.80.5 with 108 osd's. Today I noticed that one of the osd's is down: root@hqceph1:/var/log/ceph# ceph -s cluster 504b5794-34bd-44e7-a8c3-0494cf800c23 health HEALTH_WARN crush map has legacy tunables monmap e1: 3 mons at {hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0}, election epoch 146, quorum 0,1,2 hqceph1,hqceph2,hqceph3 osdmap e7119: 108 osds: 107 up, 107 in pgmap v6729985: 3208 pgs, 17 pools, 81193 GB data, 21631 kobjects 216 TB used, 171 TB / 388 TB avail 3204 active+clean 4 active+clean+scrubbing client io 4079 kB/s wr, 8 op/s Using osd dump I determined that it is osd number 70: osd.70 down out weight 0 up_from 2668 up_thru 6886 down_at 6913 last_clean_interval [488,2665) 10.35.1.217:6814/22440 10.35.1.217:6820/22440 10.35.1.217:6824/22440 10.35.1.217:6830/22440 autoout,exists 5dbd4a14-5045-490e-859b-15533cd67568 Looking at that node, the drive is still mounted and I did not see any errors in any of the system logs, and the raid level status shows the drive as up and healthy, etc. root@hqosd6:~# df -h |grep 70 /dev/sdl1 3.7T 1.9T 1.9T 51% /var/lib/ceph/osd/ceph-70 I was hoping that someone might be able to advise me on the next course of action (can I add the osd back in?, should I replace the drive altogether, etc) I have attached the osd log to this email. Any suggestions would be great. Thanks, Shain -- Shain Miley | Manager of Systems and Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com