When the time comes to replace an OSD I've used the following procedure
1) Stop/down/out the osd and replace the drive
2) Create the ceph osd directory: ceph-osd -i N --mkfs
3) Copy the osd key out of the authorized keys list
4) ceph osd crush rm osd.N
5) ceph osd crush add osd.$i $osd_size root=default host=$(hostname -s)
6) ceph osd in osd.N
7) service ceph start osd.N
If I don't do steps 4 and 5, the osd process times out in futex:
[pid 22822] futex(0x4604cc4,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 98, {1423237460,
296281000}, ffffffff <unfinished ...>
[pid 22821] futex(0x4604cc0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 22822] <... futex resumed> ) = -1 EAGAIN (Resource
temporarily unavailable)
Upping the debugging only shows:
2015-02-06 10:48:22.656012 7f9acf967700 20 osd.40 396 update_osd_stat
osd_stat(62060 kB used, 2793 GB avail, 2793 GB total, peers []/[] op
hist [])
2015-02-06 10:48:22.656025 7f9acf967700 5 osd.40 396 heartbeat:
osd_stat(62060 kB used, 2793 GB avail, 2793 GB total, peers []/[] op
hist [])
2015-02-06 10:48:23.356299 7f9ae76c7700 5 osd.40 396 tick
2015-02-06 10:48:23.356308 7f9ae76c7700 10 osd.40 396 do_waiters -- start
2015-02-06 10:48:23.356310 7f9ae76c7700 10 osd.40 396 do_waiters -- finish
2015-02-06 10:48:24.356114 7f9acf967700 20 osd.40 396 update_osd_stat
osd_stat(62060 kB used, 2793 GB avail, 2793 GB total, peers []/[] op
hist [])
in the osd log file.
What is ceph-osd doing that recreating the osd in the crush map changes?
Thanks for any enlightenment on this.
-Gaylord
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com