Hi, While trying to balance cluster over night I have hit "osd full" treshold on one osd. Now I actually cannot start it, because ir says xfs file system is full. # df -h /dev/sdb1 Filesystem Size Used Avail Use% Mounted on /dev/sdb1 373G 373G 100K 100% /var/lib/ceph/osd/ceph-0 How to recover from this? Full osd sure is the situation to escape(red flags in doc for that), whilst it should not mean lost osd, right? And some debugging output follows, probably situation is not handled best way by binary? from /var/log/ceph/ceph-osd.0.log when starting osd.0 2013-02-08 15:07:09.430192 7f4366d55780 -1 filestore(/var/lib/ceph/osd/ceph-0) _test_fiemap failed to write to /var/lib/ceph/osd/ceph-0/fiemap_test: (28) No space left on device 2013-02-08 15:07:09.435356 7f4366d55780 -1 common/config.cc: In function 'void md_config_t::remove_observer(md_config_obs_t*)' thread 7f4366d55780 time 2013-02-08 15:07:09.430779 common/config.cc: 174: FAILED assert(found_obs) ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) 1: (md_config_t::remove_observer(md_config_obs_t*)+0x1e2) [0x83c892] 2: (FileStore::umount()+0xfb) [0x6ef3ab] 3: (OSD::do_convertfs(ObjectStore*)+0x928) [0x5f2268] 4: (OSD::convertfs(std::string const&, std::string const&)+0x47) [0x5f23c7] 5: (main()+0x2141) [0x5668a1] 6: (__libc_start_main()+0xed) [0x7f4364b9a76d] 7: /usr/bin/ceph-osd() [0x568ef9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -24> 2013-02-08 15:07:09.064409 7f4366d55780 5 asok(0x14e6000) register_command perfcounters_dump hook 0x14d9010 -23> 2013-02-08 15:07:09.064458 7f4366d55780 5 asok(0x14e6000) register_command 1 hook 0x14d9010 -22> 2013-02-08 15:07:09.064464 7f4366d55780 5 asok(0x14e6000) register_command perf dump hook 0x14d9010 -21> 2013-02-08 15:07:09.064482 7f4366d55780 5 asok(0x14e6000) register_command perfcounters_schema hook 0x14d9010 -20> 2013-02-08 15:07:09.064489 7f4366d55780 5 asok(0x14e6000) register_command 2 hook 0x14d9010 -19> 2013-02-08 15:07:09.064493 7f4366d55780 5 asok(0x14e6000) register_command perf schema hook 0x14d9010 -18> 2013-02-08 15:07:09.064502 7f4366d55780 5 asok(0x14e6000) register_command config show hook 0x14d9010 -17> 2013-02-08 15:07:09.064509 7f4366d55780 5 asok(0x14e6000) register_command config set hook 0x14d9010 -16> 2013-02-08 15:07:09.064514 7f4366d55780 5 asok(0x14e6000) register_command log flush hook 0x14d9010 -15> 2013-02-08 15:07:09.064521 7f4366d55780 5 asok(0x14e6000) register_command log dump hook 0x14d9010 -14> 2013-02-08 15:07:09.064526 7f4366d55780 5 asok(0x14e6000) register_command log reopen hook 0x14d9010 -13> 2013-02-08 15:07:09.066961 7f4366d55780 0 ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061), process ceph-osd, pid 13903 -12> 2013-02-08 15:07:09.083752 7f4366d55780 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6801/13903 need_addr=1 -11> 2013-02-08 15:07:09.083803 7f4366d55780 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6802/13903 need_addr=1 -10> 2013-02-08 15:07:09.083820 7f4366d55780 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6803/13903 need_addr=1 -9> 2013-02-08 15:07:09.084621 7f4366d55780 1 finished global_init_daemonize -8> 2013-02-08 15:07:09.090620 7f4366d55780 5 asok(0x14e6000) init /var/run/ceph/ceph-osd.0.asok -7> 2013-02-08 15:07:09.090667 7f4366d55780 5 asok(0x14e6000) bind_and_listen /var/run/ceph/ceph-osd.0.asok -6> 2013-02-08 15:07:09.090730 7f4366d55780 5 asok(0x14e6000) register_command 0 hook 0x14d80b0 -5> 2013-02-08 15:07:09.090742 7f4366d55780 5 asok(0x14e6000) register_command version hook 0x14d80b0 -4> 2013-02-08 15:07:09.090754 7f4366d55780 5 asok(0x14e6000) register_command git_version hook 0x14d80b0 -3> 2013-02-08 15:07:09.090765 7f4366d55780 5 asok(0x14e6000) register_command help hook 0x14d90c0 -2> 2013-02-08 15:07:09.090821 7f4362be8700 5 asok(0x14e6000) entry start -1> 2013-02-08 15:07:09.430192 7f4366d55780 -1 filestore(/var/lib/ceph/osd/ceph-0) _test_fiemap failed to write to /var/lib/ceph/osd/ceph-0/fiemap_test: (28) No space left on device 0> 2013-02-08 15:07:09.435356 7f4366d55780 -1 common/config.cc: In function 'void md_config_t::remove_observer(md_config_obs_t*)' thread 7f4366d55780 time 2013-02-08 15:07:09.430779 common/config.cc: 174: FAILED assert(found_obs) ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) 1: (md_config_t::remove_observer(md_config_obs_t*)+0x1e2) [0x83c892] 2: (FileStore::umount()+0xfb) [0x6ef3ab] 3: (OSD::do_convertfs(ObjectStore*)+0x928) [0x5f2268] 4: (OSD::convertfs(std::string const&, std::string const&)+0x47) [0x5f23c7] 5: (main()+0x2141) [0x5668a1] 6: (__libc_start_main()+0xed) [0x7f4364b9a76d] 7: /usr/bin/ceph-osd() [0x568ef9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 0/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 100000 max_new 1000 log_file /var/log/ceph/ceph-osd.0.log --- end dump of recent events --- 2013-02-08 15:07:09.440211 7f4366d55780 -1 *** Caught signal (Aborted) ** in thread 7f4366d55780 ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) 1: /usr/bin/ceph-osd() [0x7828da] 2: (()+0xfcb0) [0x7f43661f0cb0] 3: (gsignal()+0x35) [0x7f4364baf425] 4: (abort()+0x17b) [0x7f4364bb2b8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f436550169d] 6: (()+0xb5846) [0x7f43654ff846] 7: (()+0xb5873) [0x7f43654ff873] 8: (()+0xb596e) [0x7f43654ff96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x82ce7f] 10: (md_config_t::remove_observer(md_config_obs_t*)+0x1e2) [0x83c892] 11: (FileStore::umount()+0xfb) [0x6ef3ab] 12: (OSD::do_convertfs(ObjectStore*)+0x928) [0x5f2268] 13: (OSD::convertfs(std::string const&, std::string const&)+0x47) [0x5f23c7] 14: (main()+0x2141) [0x5668a1] 15: (__libc_start_main()+0xed) [0x7f4364b9a76d] 16: /usr/bin/ceph-osd() [0x568ef9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2013-02-08 15:07:09.440211 7f4366d55780 -1 *** Caught signal (Aborted) ** in thread 7f4366d55780 ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) 1: /usr/bin/ceph-osd() [0x7828da] 2: (()+0xfcb0) [0x7f43661f0cb0] 3: (gsignal()+0x35) [0x7f4364baf425] 4: (abort()+0x17b) [0x7f4364bb2b8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f436550169d] 6: (()+0xb5846) [0x7f43654ff846] 7: (()+0xb5873) [0x7f43654ff873] 8: (()+0xb596e) [0x7f43654ff96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1df) [0x82ce7f] 10: (md_config_t::remove_observer(md_config_obs_t*)+0x1e2) [0x83c892] 11: (FileStore::umount()+0xfb) [0x6ef3ab] 12: (OSD::do_convertfs(ObjectStore*)+0x928) [0x5f2268] 13: (OSD::convertfs(std::string const&, std::string const&)+0x47) [0x5f23c7] 14: (main()+0x2141) [0x5668a1] 15: (__libc_start_main()+0xed) [0x7f4364b9a76d] 16: /usr/bin/ceph-osd() [0x568ef9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 0/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 100000 max_new 1000 log_file /var/log/ceph/ceph-osd.0.log --- end dump of recent events --- Ugis _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com