OSDs crashing on garbage data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



HI, 

I'm seeing at of errors like the following.  The root cause appears to be the existence of a collection-- garbage data in the filestore.    To clean it up, I have to remove a set of empty directories.   The directories are old, created last August or September.    I've had this happen a number of times now.   Does anyone know why this is happening and/or if I can do have ceph recover automatically? 

Thanks,
Jeff
# ceph -v 
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)


 --- begin dump of recent events ---
   -93> 2016-02-12 04:37:07.434380 7f7a25ae3900  5 asok(0x476a000) register_command perfcounters_dump hook 0x470a050
   -92> 2016-02-12 04:37:07.434425 7f7a25ae3900  5 asok(0x476a000) register_command 1 hook 0x470a050
   -91> 2016-02-12 04:37:07.434430 7f7a25ae3900  5 asok(0x476a000) register_command perf dump hook 0x470a050
   -90> 2016-02-12 04:37:07.434438 7f7a25ae3900  5 asok(0x476a000) register_command perfcounters_schema hook 0x470a050
   -89> 2016-02-12 04:37:07.434558 7f7a25ae3900  5 asok(0x476a000) register_command 2 hook 0x470a050
   -88> 2016-02-12 04:37:07.434588 7f7a25ae3900  5 asok(0x476a000) register_command perf schema hook 0x470a050
   -87> 2016-02-12 04:37:07.434631 7f7a25ae3900  5 asok(0x476a000) register_command perf reset hook 0x470a050
   -86> 2016-02-12 04:37:07.434646 7f7a25ae3900  5 asok(0x476a000) register_command config show hook 0x470a050
   -85> 2016-02-12 04:37:07.434653 7f7a25ae3900  5 asok(0x476a000) register_command config set hook 0x470a050
   -84> 2016-02-12 04:37:07.434666 7f7a25ae3900  5 asok(0x476a000) register_command config get hook 0x470a050
   -83> 2016-02-12 04:37:07.434672 7f7a25ae3900  5 asok(0x476a000) register_command config diff hook 0x470a050
   -82> 2016-02-12 04:37:07.434687 7f7a25ae3900  5 asok(0x476a000) register_command log flush hook 0x470a050
   -81> 2016-02-12 04:37:07.434692 7f7a25ae3900  5 asok(0x476a000) register_command log dump hook 0x470a050
   -80> 2016-02-12 04:37:07.434704 7f7a25ae3900  5 asok(0x476a000) register_command log reopen hook 0x470a050
   -79> 2016-02-12 04:37:07.453069 7f7a25ae3900  0 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 1343579
   -78> 2016-02-12 04:37:07.453917 7f7a25ae3900  1 -- 10.31.0.66:0/0 learned my addr 10.31.0.66:0/0
   -77> 2016-02-12 04:37:07.453930 7f7a25ae3900  1 accepter.accepter.bind my_inst.addr is 10.31.0.66:6844/1343579 need_addr=0
   -76> 2016-02-12 04:37:07.453972 7f7a25ae3900  1 -- 10.31.0.2:0/0 learned my addr 10.31.0.2:0/0
   -75> 2016-02-12 04:37:07.453979 7f7a25ae3900  1 accepter.accepter.bind my_inst.addr is 10.31.0.2:6846/1343579 need_addr=0
   -74> 2016-02-12 04:37:07.454006 7f7a25ae3900  1 -- 10.31.0.2:0/0 learned my addr 10.31.0.2:0/0
   -73> 2016-02-12 04:37:07.454012 7f7a25ae3900  1 accepter.accepter.bind my_inst.addr is 10.31.0.2:6847/1343579 need_addr=0
   -72> 2016-02-12 04:37:07.454057 7f7a25ae3900  1 -- 10.31.0.66:0/0 learned my addr 10.31.0.66:0/0
   -71> 2016-02-12 04:37:07.454064 7f7a25ae3900  1 accepter.accepter.bind my_inst.addr is 10.31.0.66:6845/1343579 need_addr=0
   -70> 2016-02-12 04:37:07.455581 7f7a25ae3900  5 asok(0x476a000) init /var/run/ceph/ceph-osd.258.asok
   -69> 2016-02-12 04:37:07.455599 7f7a25ae3900  5 asok(0x476a000) bind_and_listen /var/run/ceph/ceph-osd.258.asok
   -68> 2016-02-12 04:37:07.455715 7f7a25ae3900  5 asok(0x476a000) register_command 0 hook 0x47060b0
   -67> 2016-02-12 04:37:07.455725 7f7a25ae3900  5 asok(0x476a000) register_command version hook 0x47060b0
   -66> 2016-02-12 04:37:07.455739 7f7a25ae3900  5 asok(0x476a000) register_command git_version hook 0x47060b0
   -65> 2016-02-12 04:37:07.455746 7f7a25ae3900  5 asok(0x476a000) register_command help hook 0x470a120
   -64> 2016-02-12 04:37:07.455748 7f7a25ae3900  5 asok(0x476a000) register_command get_command_descriptions hook 0x470a110
   -63> 2016-02-12 04:37:07.455781 7f7a25ae3900 10 monclient(hunting): build_initial_monmap
   -62> 2016-02-12 04:37:07.455810 7f7a1fbea700  5 asok(0x476a000) entry start
   -61> 2016-02-12 04:37:07.463721 7f7a25ae3900  5 adding auth protocol: cephx
   -60> 2016-02-12 04:37:07.463739 7f7a25ae3900  5 adding auth protocol: cephx
   -59> 2016-02-12 04:37:07.463862 7f7a25ae3900  5 asok(0x476a000) register_command objecter_requests hook 0x470a150
   -58> 2016-02-12 04:37:07.463927 7f7a25ae3900  1 -- 10.31.0.66:6844/1343579 messenger.start
   -57> 2016-02-12 04:37:07.464019 7f7a25ae3900  1 -- :/0 messenger.start
   -56> 2016-02-12 04:37:07.464049 7f7a25ae3900  1 -- 10.31.0.66:6845/1343579 messenger.start
   -55> 2016-02-12 04:37:07.464078 7f7a25ae3900  1 -- 10.31.0.2:6847/1343579 messenger.start
   -54> 2016-02-12 04:37:07.464104 7f7a25ae3900  1 -- 10.31.0.2:6846/1343579 messenger.start
   -53> 2016-02-12 04:37:07.464130 7f7a25ae3900  1 -- :/0 messenger.start
   -52> 2016-02-12 04:37:07.464196 7f7a25ae3900  2 osd.258 0 mounting /var/lib/ceph/osd/ceph-258 /var/lib/ceph/osd/ceph-258/journal
   -51> 2016-02-12 04:37:07.464267 7f7a25ae3900  0 filestore(/var/lib/ceph/osd/ceph-258) backend xfs (magic 0x58465342)
   -50> 2016-02-12 04:37:07.466352 7f7a25ae3900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-258) detect_features: FIEMAP ioctl is supported and appears to work
   -49> 2016-02-12 04:37:07.466365 7f7a25ae3900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-258) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
   -48> 2016-02-12 04:37:07.505678 7f7a25ae3900  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-258) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
   -47> 2016-02-12 04:37:07.506003 7f7a25ae3900  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-258) detect_feature: extsize is supported and kernel 3.13.0-65-generic >= 3.5
   -46> 2016-02-12 04:37:07.596850 7f7a25ae3900  0 filestore(/var/lib/ceph/osd/ceph-258) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
   -45> 2016-02-12 04:37:07.598696 7f7a25ae3900  2 journal open /var/lib/ceph/osd/ceph-258/journal fsid ae25d09c-69e3-452a-b055-618bb18af445 fs_op_seq 7506402
   -44> 2016-02-12 04:37:07.598747 7f7a25ae3900 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
   -43> 2016-02-12 04:37:07.598759 7f7a25ae3900  1 journal _open /var/lib/ceph/osd/ceph-258/journal fd 19: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 0
   -42> 2016-02-12 04:37:07.598865 7f7a25ae3900  2 journal read_entry 55754752 : seq 7506403 6438 bytes
   -41> 2016-02-12 04:37:07.598901 7f7a25ae3900  2 journal read_entry 55754752 : seq 7506403 6438 bytes
   -40> 2016-02-12 04:37:07.598912 7f7a25ae3900  3 journal journal_replay: applying op seq 7506403
   -39> 2016-02-12 04:37:07.602051 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506403
   -38> 2016-02-12 04:37:07.602088 7f7a25ae3900  2 journal read_entry 55762944 : seq 7506404 6519 bytes
   -37> 2016-02-12 04:37:07.602092 7f7a25ae3900  3 journal journal_replay: applying op seq 7506404
   -36> 2016-02-12 04:37:07.605125 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506404
   -35> 2016-02-12 04:37:07.605158 7f7a25ae3900  2 journal read_entry 55771136 : seq 7506405 6541 bytes
   -34> 2016-02-12 04:37:07.605163 7f7a25ae3900  3 journal journal_replay: applying op seq 7506405
   -33> 2016-02-12 04:37:07.607994 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506405
   -32> 2016-02-12 04:37:07.608009 7f7a25ae3900  2 journal read_entry 55779328 : seq 7506406 6508 bytes
   -31> 2016-02-12 04:37:07.608011 7f7a25ae3900  3 journal journal_replay: applying op seq 7506406
   -30> 2016-02-12 04:37:07.609306 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506406
   -29> 2016-02-12 04:37:07.609324 7f7a25ae3900  2 journal read_entry 55787520 : seq 7506407 6684 bytes
   -28> 2016-02-12 04:37:07.609326 7f7a25ae3900  3 journal journal_replay: applying op seq 7506407
   -27> 2016-02-12 04:37:07.610652 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506407
   -26> 2016-02-12 04:37:07.610667 7f7a25ae3900  2 journal read_entry 55795712 : seq 7506408 6743 bytes
   -25> 2016-02-12 04:37:07.610668 7f7a25ae3900  3 journal journal_replay: applying op seq 7506408
   -24> 2016-02-12 04:37:07.611973 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506408
   -23> 2016-02-12 04:37:07.611986 7f7a25ae3900  2 journal read_entry 55803904 : seq 7506409 6880 bytes
   -22> 2016-02-12 04:37:07.611988 7f7a25ae3900  3 journal journal_replay: applying op seq 7506409
   -21> 2016-02-12 04:37:07.613289 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506409
   -20> 2016-02-12 04:37:07.613303 7f7a25ae3900  2 journal read_entry 55812096 : seq 7506410 6552 bytes
   -19> 2016-02-12 04:37:07.613305 7f7a25ae3900  3 journal journal_replay: applying op seq 7506410
   -18> 2016-02-12 04:37:07.614598 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506410
   -17> 2016-02-12 04:37:07.614612 7f7a25ae3900  2 journal read_entry 55820288 : seq 7506411 6749 bytes
   -16> 2016-02-12 04:37:07.614614 7f7a25ae3900  3 journal journal_replay: applying op seq 7506411
   -15> 2016-02-12 04:37:07.615947 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506411
   -14> 2016-02-12 04:37:07.615962 7f7a25ae3900  2 journal read_entry 55828480 : seq 7506412 6870 bytes
   -13> 2016-02-12 04:37:07.615964 7f7a25ae3900  3 journal journal_replay: applying op seq 7506412
   -12> 2016-02-12 04:37:07.617341 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506412
   -11> 2016-02-12 04:37:07.617356 7f7a25ae3900  2 journal read_entry 55836672 : seq 7506413 6465 bytes
   -10> 2016-02-12 04:37:07.617357 7f7a25ae3900  3 journal journal_replay: applying op seq 7506413
    -9> 2016-02-12 04:37:07.618640 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506413
    -8> 2016-02-12 04:37:07.618652 7f7a25ae3900  2 journal read_entry 55844864 : seq 7506414 544 bytes
    -7> 2016-02-12 04:37:07.618654 7f7a25ae3900  3 journal journal_replay: applying op seq 7506414
    -6> 2016-02-12 04:37:07.618751 7f7a25ae3900  3 journal journal_replay: r = 0, op_seq now 7506414
    -5> 2016-02-12 04:37:07.618762 7f7a25ae3900  2 journal read_entry 55848960 : seq 7506415 264 bytes
    -4> 2016-02-12 04:37:07.618767 7f7a25ae3900  3 journal journal_replay: applying op seq 7506415
    -3> 2016-02-12 04:37:07.618965 7f7a25ae3900  0 filestore(/var/lib/ceph/osd/ceph-258)  error (39) Directory not empty not handled on operation 0x4743fd6 (7506415.0.1, or op 1, counting from 0)
    -2> 2016-02-12 04:37:07.618979 7f7a25ae3900  0 filestore(/var/lib/ceph/osd/ceph-258) ENOTEMPTY suggests garbage data in osd data dir
    -1> 2016-02-12 04:37:07.618981 7f7a25ae3900  0 filestore(/var/lib/ceph/osd/ceph-258)  transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "remove",
            "collection": "70.520s1_head",
            "oid": "520\/\/head\/\/70\/18446744073709551615\/1"
        },
        {
            "op_num": 1,
            "op_name": "rmcoll",
            "collection": "70.520s1_head"
        }
    ]
}

     0> 2016-02-12 04:37:07.621152 7f7a25ae3900 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f7a25ae3900 time 2016-02-12 04:37:07.619013
os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error")

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc60eb]
 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x923d12]
 3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 4: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) [0x94355b]
 5: (FileStore::mount()+0x3bb6) [0x9139f6]
 6: (OSD::init()+0x259) [0x6c59b9]
 7: (main()+0x2860) [0x6527e0]
 8: (__libc_start_main()+0xf5) [0x7f7a22c21ec5]
 9: /usr/bin/ceph-osd() [0x66b887]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Jeffrey McDonald, PhD
Assistant Director for HPC Operations
Minnesota Supercomputing Institute
University of Minnesota Twin Cities
599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
117 Pleasant St SE           phone: +1 612 625-6905
Minneapolis, MN 55455        fax:   +1 612 624-8861

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux