HI,
I'm seeing at of errors like the following. The root cause appears to be the existence of a collection-- garbage data in the filestore. To clean it up, I have to remove a set of empty directories. The directories are old, created last August or September. I've had this happen a number of times now. Does anyone know why this is happening and/or if I can do have ceph recover automatically?
Thanks,
Jeff
# ceph -v
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
--- begin dump of recent events ---
-93> 2016-02-12 04:37:07.434380 7f7a25ae3900 5 asok(0x476a000) register_command perfcounters_dump hook 0x470a050
-92> 2016-02-12 04:37:07.434425 7f7a25ae3900 5 asok(0x476a000) register_command 1 hook 0x470a050
-91> 2016-02-12 04:37:07.434430 7f7a25ae3900 5 asok(0x476a000) register_command perf dump hook 0x470a050
-90> 2016-02-12 04:37:07.434438 7f7a25ae3900 5 asok(0x476a000) register_command perfcounters_schema hook 0x470a050
-89> 2016-02-12 04:37:07.434558 7f7a25ae3900 5 asok(0x476a000) register_command 2 hook 0x470a050
-88> 2016-02-12 04:37:07.434588 7f7a25ae3900 5 asok(0x476a000) register_command perf schema hook 0x470a050
-87> 2016-02-12 04:37:07.434631 7f7a25ae3900 5 asok(0x476a000) register_command perf reset hook 0x470a050
-86> 2016-02-12 04:37:07.434646 7f7a25ae3900 5 asok(0x476a000) register_command config show hook 0x470a050
-85> 2016-02-12 04:37:07.434653 7f7a25ae3900 5 asok(0x476a000) register_command config set hook 0x470a050
-84> 2016-02-12 04:37:07.434666 7f7a25ae3900 5 asok(0x476a000) register_command config get hook 0x470a050
-83> 2016-02-12 04:37:07.434672 7f7a25ae3900 5 asok(0x476a000) register_command config diff hook 0x470a050
-82> 2016-02-12 04:37:07.434687 7f7a25ae3900 5 asok(0x476a000) register_command log flush hook 0x470a050
-81> 2016-02-12 04:37:07.434692 7f7a25ae3900 5 asok(0x476a000) register_command log dump hook 0x470a050
-80> 2016-02-12 04:37:07.434704 7f7a25ae3900 5 asok(0x476a000) register_command log reopen hook 0x470a050
-79> 2016-02-12 04:37:07.453069 7f7a25ae3900 0 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 1343579
-78> 2016-02-12 04:37:07.453917 7f7a25ae3900 1 -- 10.31.0.66:0/0 learned my addr 10.31.0.66:0/0
-77> 2016-02-12 04:37:07.453930 7f7a25ae3900 1 accepter.accepter.bind my_inst.addr is 10.31.0.66:6844/1343579 need_addr=0
-76> 2016-02-12 04:37:07.453972 7f7a25ae3900 1 -- 10.31.0.2:0/0 learned my addr 10.31.0.2:0/0
-75> 2016-02-12 04:37:07.453979 7f7a25ae3900 1 accepter.accepter.bind my_inst.addr is 10.31.0.2:6846/1343579 need_addr=0
-74> 2016-02-12 04:37:07.454006 7f7a25ae3900 1 -- 10.31.0.2:0/0 learned my addr 10.31.0.2:0/0
-73> 2016-02-12 04:37:07.454012 7f7a25ae3900 1 accepter.accepter.bind my_inst.addr is 10.31.0.2:6847/1343579 need_addr=0
-72> 2016-02-12 04:37:07.454057 7f7a25ae3900 1 -- 10.31.0.66:0/0 learned my addr 10.31.0.66:0/0
-71> 2016-02-12 04:37:07.454064 7f7a25ae3900 1 accepter.accepter.bind my_inst.addr is 10.31.0.66:6845/1343579 need_addr=0
-70> 2016-02-12 04:37:07.455581 7f7a25ae3900 5 asok(0x476a000) init /var/run/ceph/ceph-osd.258.asok
-69> 2016-02-12 04:37:07.455599 7f7a25ae3900 5 asok(0x476a000) bind_and_listen /var/run/ceph/ceph-osd.258.asok
-68> 2016-02-12 04:37:07.455715 7f7a25ae3900 5 asok(0x476a000) register_command 0 hook 0x47060b0
-67> 2016-02-12 04:37:07.455725 7f7a25ae3900 5 asok(0x476a000) register_command version hook 0x47060b0
-66> 2016-02-12 04:37:07.455739 7f7a25ae3900 5 asok(0x476a000) register_command git_version hook 0x47060b0
-65> 2016-02-12 04:37:07.455746 7f7a25ae3900 5 asok(0x476a000) register_command help hook 0x470a120
-64> 2016-02-12 04:37:07.455748 7f7a25ae3900 5 asok(0x476a000) register_command get_command_descriptions hook 0x470a110
-63> 2016-02-12 04:37:07.455781 7f7a25ae3900 10 monclient(hunting): build_initial_monmap
-62> 2016-02-12 04:37:07.455810 7f7a1fbea700 5 asok(0x476a000) entry start
-61> 2016-02-12 04:37:07.463721 7f7a25ae3900 5 adding auth protocol: cephx
-60> 2016-02-12 04:37:07.463739 7f7a25ae3900 5 adding auth protocol: cephx
-59> 2016-02-12 04:37:07.463862 7f7a25ae3900 5 asok(0x476a000) register_command objecter_requests hook 0x470a150
-58> 2016-02-12 04:37:07.463927 7f7a25ae3900 1 -- 10.31.0.66:6844/1343579 messenger.start
-57> 2016-02-12 04:37:07.464019 7f7a25ae3900 1 -- :/0 messenger.start
-56> 2016-02-12 04:37:07.464049 7f7a25ae3900 1 -- 10.31.0.66:6845/1343579 messenger.start
-55> 2016-02-12 04:37:07.464078 7f7a25ae3900 1 -- 10.31.0.2:6847/1343579 messenger.start
-54> 2016-02-12 04:37:07.464104 7f7a25ae3900 1 -- 10.31.0.2:6846/1343579 messenger.start
-53> 2016-02-12 04:37:07.464130 7f7a25ae3900 1 -- :/0 messenger.start
-52> 2016-02-12 04:37:07.464196 7f7a25ae3900 2 osd.258 0 mounting /var/lib/ceph/osd/ceph-258 /var/lib/ceph/osd/ceph-258/journal
-51> 2016-02-12 04:37:07.464267 7f7a25ae3900 0 filestore(/var/lib/ceph/osd/ceph-258) backend xfs (magic 0x58465342)
-50> 2016-02-12 04:37:07.466352 7f7a25ae3900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-258) detect_features: FIEMAP ioctl is supported and appears to work
-49> 2016-02-12 04:37:07.466365 7f7a25ae3900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-258) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
-48> 2016-02-12 04:37:07.505678 7f7a25ae3900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-258) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
-47> 2016-02-12 04:37:07.506003 7f7a25ae3900 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-258) detect_feature: extsize is supported and kernel 3.13.0-65-generic >= 3.5
-46> 2016-02-12 04:37:07.596850 7f7a25ae3900 0 filestore(/var/lib/ceph/osd/ceph-258) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
-45> 2016-02-12 04:37:07.598696 7f7a25ae3900 2 journal open /var/lib/ceph/osd/ceph-258/journal fsid ae25d09c-69e3-452a-b055-618bb18af445 fs_op_seq 7506402
-44> 2016-02-12 04:37:07.598747 7f7a25ae3900 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
-43> 2016-02-12 04:37:07.598759 7f7a25ae3900 1 journal _open /var/lib/ceph/osd/ceph-258/journal fd 19: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 0
-42> 2016-02-12 04:37:07.598865 7f7a25ae3900 2 journal read_entry 55754752 : seq 7506403 6438 bytes
-41> 2016-02-12 04:37:07.598901 7f7a25ae3900 2 journal read_entry 55754752 : seq 7506403 6438 bytes
-40> 2016-02-12 04:37:07.598912 7f7a25ae3900 3 journal journal_replay: applying op seq 7506403
-39> 2016-02-12 04:37:07.602051 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506403
-38> 2016-02-12 04:37:07.602088 7f7a25ae3900 2 journal read_entry 55762944 : seq 7506404 6519 bytes
-37> 2016-02-12 04:37:07.602092 7f7a25ae3900 3 journal journal_replay: applying op seq 7506404
-36> 2016-02-12 04:37:07.605125 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506404
-35> 2016-02-12 04:37:07.605158 7f7a25ae3900 2 journal read_entry 55771136 : seq 7506405 6541 bytes
-34> 2016-02-12 04:37:07.605163 7f7a25ae3900 3 journal journal_replay: applying op seq 7506405
-33> 2016-02-12 04:37:07.607994 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506405
-32> 2016-02-12 04:37:07.608009 7f7a25ae3900 2 journal read_entry 55779328 : seq 7506406 6508 bytes
-31> 2016-02-12 04:37:07.608011 7f7a25ae3900 3 journal journal_replay: applying op seq 7506406
-30> 2016-02-12 04:37:07.609306 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506406
-29> 2016-02-12 04:37:07.609324 7f7a25ae3900 2 journal read_entry 55787520 : seq 7506407 6684 bytes
-28> 2016-02-12 04:37:07.609326 7f7a25ae3900 3 journal journal_replay: applying op seq 7506407
-27> 2016-02-12 04:37:07.610652 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506407
-26> 2016-02-12 04:37:07.610667 7f7a25ae3900 2 journal read_entry 55795712 : seq 7506408 6743 bytes
-25> 2016-02-12 04:37:07.610668 7f7a25ae3900 3 journal journal_replay: applying op seq 7506408
-24> 2016-02-12 04:37:07.611973 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506408
-23> 2016-02-12 04:37:07.611986 7f7a25ae3900 2 journal read_entry 55803904 : seq 7506409 6880 bytes
-22> 2016-02-12 04:37:07.611988 7f7a25ae3900 3 journal journal_replay: applying op seq 7506409
-21> 2016-02-12 04:37:07.613289 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506409
-20> 2016-02-12 04:37:07.613303 7f7a25ae3900 2 journal read_entry 55812096 : seq 7506410 6552 bytes
-19> 2016-02-12 04:37:07.613305 7f7a25ae3900 3 journal journal_replay: applying op seq 7506410
-18> 2016-02-12 04:37:07.614598 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506410
-17> 2016-02-12 04:37:07.614612 7f7a25ae3900 2 journal read_entry 55820288 : seq 7506411 6749 bytes
-16> 2016-02-12 04:37:07.614614 7f7a25ae3900 3 journal journal_replay: applying op seq 7506411
-15> 2016-02-12 04:37:07.615947 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506411
-14> 2016-02-12 04:37:07.615962 7f7a25ae3900 2 journal read_entry 55828480 : seq 7506412 6870 bytes
-13> 2016-02-12 04:37:07.615964 7f7a25ae3900 3 journal journal_replay: applying op seq 7506412
-12> 2016-02-12 04:37:07.617341 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506412
-11> 2016-02-12 04:37:07.617356 7f7a25ae3900 2 journal read_entry 55836672 : seq 7506413 6465 bytes
-10> 2016-02-12 04:37:07.617357 7f7a25ae3900 3 journal journal_replay: applying op seq 7506413
-9> 2016-02-12 04:37:07.618640 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506413
-8> 2016-02-12 04:37:07.618652 7f7a25ae3900 2 journal read_entry 55844864 : seq 7506414 544 bytes
-7> 2016-02-12 04:37:07.618654 7f7a25ae3900 3 journal journal_replay: applying op seq 7506414
-6> 2016-02-12 04:37:07.618751 7f7a25ae3900 3 journal journal_replay: r = 0, op_seq now 7506414
-5> 2016-02-12 04:37:07.618762 7f7a25ae3900 2 journal read_entry 55848960 : seq 7506415 264 bytes
-4> 2016-02-12 04:37:07.618767 7f7a25ae3900 3 journal journal_replay: applying op seq 7506415
-3> 2016-02-12 04:37:07.618965 7f7a25ae3900 0 filestore(/var/lib/ceph/osd/ceph-258) error (39) Directory not empty not handled on operation 0x4743fd6 (7506415.0.1, or op 1, counting from 0)
-2> 2016-02-12 04:37:07.618979 7f7a25ae3900 0 filestore(/var/lib/ceph/osd/ceph-258) ENOTEMPTY suggests garbage data in osd data dir
-1> 2016-02-12 04:37:07.618981 7f7a25ae3900 0 filestore(/var/lib/ceph/osd/ceph-258) transaction dump:
{
"ops": [
{
"op_num": 0,
"op_name": "remove",
"collection": "70.520s1_head",
"oid": "520\/\/head\/\/70\/18446744073709551615\/1"
},
{
"op_num": 1,
"op_name": "rmcoll",
"collection": "70.520s1_head"
}
]
}
0> 2016-02-12 04:37:07.621152 7f7a25ae3900 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f7a25ae3900 time 2016-02-12 04:37:07.619013
os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error")
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb]
2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
4: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) [0x94355b]
5: (FileStore::mount()+0x3bb6) [0x9139f6]
6: (OSD::init()+0x259) [0x6c59b9]
7: (main()+0x2860) [0x6527e0]
8: (__libc_start_main()+0xf5) [0x7f7a22c21ec5]
9: /usr/bin/ceph-osd() [0x66b887]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jeffrey McDonald, PhD Assistant Director for HPC Operations Minnesota Supercomputing Institute University of Minnesota Twin Cities 599 Walter Library email: jeffrey.mcdonald@xxxxxxxxxxx 117 Pleasant St SE phone: +1 612 625-6905 Minneapolis, MN 55455 fax: +1 612 624-8861
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com