Thanks! Craig.
About the time skew, I saw the log said the time difference
should be less than 50ms. I setup one of my nodes as the time
server, and the others sync the time with it. I don't know why
the system time still changes frequently especially after
reboot. Maybe it's because all my nodes are VMware virtual
machines. The softclock is not accurate enough.
If you followed the standard setup,
each OSD is it's own disk + filesystem.
/var/lib/ceph/osd/ceph-2 is in use, as the mount point for the
OSD.2 filesystem. Double check by examining the output of the
`mount` command.
I get the same error when I try to rename a directory that's
used as a mount point.
Try `umount /var/lib/ceph/osd/ceph-2` instead of the mv and
rm. The fuser command is telling you that the kernel has a
filesystem mounted in that directory. Nothing else appears to
be using it, so the umount should complete successfully.
Also, you should fix that time skew on mon.ceph-node5. The
mailing list archives have several posts with good answers.
On 6/15/2013 2:14 AM, Da Chun wrote:
Hi all,
On Ubuntu 13.04 with ceph 0.61.3.
I want to remove osd.2 from my cluster. The following steps were
performed:
root@ceph-node6:~# ceph osd out
osd.2
marked out osd.2.
root@ceph-node6:~# ceph -w
health HEALTH_WARN clock skew
detected on mon.ceph-node5
monmap e1: 3 mons at
{ceph-node4=172.18.46.34:6789/0,ceph-node5=172.18.46.35:6789/0,ceph-node6=172.18.46.36:6789/0},
election epoch 124, quorum 0,1,2
ceph-node4,ceph-node5,ceph-node6
osdmap e414: 6 osds: 5 up, 5
in
pgmap v10540: 456 pgs: 456
active+clean; 12171 MB data, 24325 MB used, 50360 MB /
74685 MB avail
mdsmap e102: 1/1/1 up
{0=ceph-node4=up:active}
2013-06-15 16:55:22.096059 mon.0
[INF] pgmap v10540: 456 pgs: 456 active+clean; 12171 MB
data, 24325 MB used, 50360 MB / 74685 MB avail
^C
root@ceph-node6:~# stop ceph-osd
id=2
ceph-osd stop/waiting
root@ceph-node6:~# ceph osd crush
remove osd.2
removed item id 2 name 'osd.2'
from crush map
root@ceph-node6:~# ceph auth del
osd.2
updated
root@ceph-node6:~# ceph osd rm 2
removed osd.2
root@ceph-node6:~# mv
/var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2.bak
mv: cannot move
‘/var/lib/ceph/osd/ceph-2’ to
‘/var/lib/ceph/osd/ceph-2.bak’: Device or resource busy
Everything was working OK until the last step to remove
the osd.2 directory /var/lib/ceph/osd/ceph-2.
root@ceph-node6:~# fuser -v
/var/lib/ceph/osd/ceph-2
USER
PID ACCESS COMMAND
/var/lib/ceph/osd/ceph-2:
root
kernel mount /var/lib/ceph/osd/ceph-2
////////////////// What does this mean?
root@ceph-node6:~# lsof +D
/var/lib/ceph/osd/ceph-2
root@ceph-node6:~#
I restarted the system, and found that the osd.2 daemon
was still running:
root@ceph-node6:~# ps aux | grep
osd
root 1264 1.4 12.3 550940
125732 ? Ssl 16:41 0:20 /usr/bin/ceph-osd
--cluster=ceph -i 2 -f
root 2876 0.0 0.0 4440
628 ? Ss 16:44 0:00 /bin/sh -e -c
/usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id"
-f /bin/sh
root 2877 4.9 18.2 613780
185676 ? Sl 16:44 1:04 /usr/bin/ceph-osd
--cluster=ceph -i 5 -f
I have to take this workaround:
root@ceph-node6:~# rm -rf
/var/lib/ceph/osd/ceph-2
rm: cannot remove
‘/var/lib/ceph/osd/ceph-2’: Device or resource busy
root@ceph-node6:~# ls
/var/lib/ceph/osd/ceph-2
root@ceph-node6:~# shutdown -r
now
....
root@ceph-node6:~# ps aux | grep
osd
root 1416 0.0 0.0 4440
628 ? Ss 17:10 0:00 /bin/sh -e -c
/usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id"
-f /bin/sh
root 1417 8.9 5.8 468052
59868 ? Sl 17:10 0:02 /usr/bin/ceph-osd
--cluster=ceph -i 5 -f
root@ceph-node6:~# rm -r
/var/lib/ceph/osd/ceph-2
root@ceph-node6:~#
Any idea? HELP!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com