Hi! We are seeing a strange - and problematic - behavior in our 0.94.1 cluster on Ubuntu 14.04.1. We have 5 nodes, 4 OSDs each. When rebooting one of the nodes (e. g. for a kernel upgrade) the OSDs do not seem to shut down correctly. Clients hang and ceph osd tree show the OSDs of that node still up. Repeated runs of ceph osd tree show them going down after a while. For instance, here OSD.7 is still up, even though the machine is in the middle of the reboot cycle. [C|root@control01] ~ ➜ ceph osd tree # id weight type name up/down reweight -1 36.2 root default -2 7.24 host node01 0 1.81 osd.0 up 1 5 1.81 osd.5 up 1 10 1.81 osd.10 up 1 15 1.81 osd.15 up 1 -3 7.24 host node02 1 1.81 osd.1 up 1 6 1.81 osd.6 up 1 11 1.81 osd.11 up 1 16 1.81 osd.16 up 1 -4 7.24 host node03 2 1.81 osd.2 down 1 7 1.81 osd.7 up 1 12 1.81 osd.12 down 1 17 1.81 osd.17 down 1 -5 7.24 host node04 3 1.81 osd.3 up 1 8 1.81 osd.8 up 1 13 1.81 osd.13 up 1 18 1.81 osd.18 up 1 -6 7.24 host node05 4 1.81 osd.4 up 1 9 1.81 osd.9 up 1 14 1.81 osd.14 up 1 19 1.81 osd.19 up 1 So it seems, the services are either not shut down correctly when the reboot begins, or they do not get enough time to actually let the cluster know they are going away. If I stop the OSDs on that node manually before the reboot, everything works as expected and clients don't notice any interruptions. [C|root@node03] ~ ➜ service ceph-osd stop id=2 ceph-osd stop/waiting [C|root@node03] ~ ➜ service ceph-osd stop id=7 ceph-osd stop/waiting [C|root@node03] ~ ➜ service ceph-osd stop id=12 ceph-osd stop/waiting [C|root@node03] ~ ➜ service ceph-osd stop id=17 ceph-osd stop/waiting [C|root@node03] ~ ➜ reboot The upstart file was not changed from the packaged version. Interestingly, the same Ceph version on a different cluster does _not_ show this behaviour. Any ideas as to what is causing this or how to diagnose this? Cheers, Daniel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com