Hi, I'm setting up a small ceph 0.56.2 cluster on 3 64-bit Debian 6 servers with kernel 3.7.2. My problem is that OSD die. First I try to start them with the init script: > /etc/init.d/ceph start osd.0 ... starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal > ps -ef | grep ceph (No ceph-osd process) I then run with debugging: > ceph-osd -i 0 --debug_ms 20 --debug_osd 20 --debug_filestore 20 --debug_journal 20 -d starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2013-02-13 18:04:40.351830 7fe98cd8a760 10 -- :/0 rank.bind :/0 2013-02-13 18:04:40.351895 7fe98cd8a760 10 accepter.accepter.bind 2013-02-13 18:04:40.351910 7fe98cd8a760 10 accepter.accepter.bind bound on random port 0.0.0.0:6800/0 2013-02-13 18:04:40.351919 7fe98cd8a760 10 accepter.accepter.bind bound to 0.0.0.0:6800/0 2013-02-13 18:04:40.351930 7fe98cd8a760 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6800/8438 need_addr=1 2013-02-13 18:04:40.351935 7fe98cd8a760 10 -- :/0 rank.bind :/0 2013-02-13 18:04:40.351938 7fe98cd8a760 10 accepter.accepter.bind 2013-02-13 18:04:40.351943 7fe98cd8a760 10 accepter.accepter.bind bound on random port 0.0.0.0:6801/0 2013-02-13 18:04:40.351946 7fe98cd8a760 10 accepter.accepter.bind bound to 0.0.0.0:6801/0 2013-02-13 18:04:40.351952 7fe98cd8a760 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6801/8438 need_addr=1 2013-02-13 18:04:40.351959 7fe98cd8a760 10 -- :/0 rank.bind :/0 2013-02-13 18:04:40.351961 7fe98cd8a760 10 accepter.accepter.bind 2013-02-13 18:04:40.351966 7fe98cd8a760 10 accepter.accepter.bind bound on random port 0.0.0.0:6802/0 2013-02-13 18:04:40.351969 7fe98cd8a760 10 accepter.accepter.bind bound to 0.0.0.0:6802/0 2013-02-13 18:04:40.351975 7fe98cd8a760 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6802/8438 need_addr=1 2013-02-13 18:04:40.352636 7fe98cd8a760 5 filestore(/var/lib/ceph/osd/ceph-0) basedir /var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journa l 2013-02-13 18:04:40.352664 7fe98cd8a760 10 filestore(/var/lib/ceph/osd/ceph-0) mount fsid is 0ab92be4-3b42-47bc-bd88-b0e11da5b450 2013-02-13 18:04:40.426222 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is supported and appears to work 2013-02-13 18:04:40.426234 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-02-13 18:04:40.426567 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount did NOT detect btrfs 2013-02-13 18:04:40.426575 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount syncfs(2) syscall not supported 2013-02-13 18:04:40.426630 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount no syncfs(2), must use sync(2). 2013-02-13 18:04:40.426631 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount WARNING: multiple ceph-osd daemons on the same host will be slow 2013-02-13 18:04:40.426701 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount found snaps <> 2013-02-13 18:04:40.426719 7fe98cd8a760 5 filestore(/var/lib/ceph/osd/ceph-0) mount op_seq is 2 2013-02-13 18:04:40.515151 7fe98cd8a760 20 filestore (init)dbobjectmap: seq is 1 2013-02-13 18:04:40.515217 7fe98cd8a760 10 filestore(/var/lib/ceph/osd/ceph-0) open_journal at /var/lib/ceph/osd/ceph-0/journal 2013-02-13 18:04:40.515243 7fe98cd8a760 0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2013-02-13 18:04:40.515252 7fe98cd8a760 10 filestore(/var/lib/ceph/osd/ceph-0) list_collections 2013-02-13 18:04:40.515352 7fe98cd8a760 10 journal journal_replay fs op_seq 2 2013-02-13 18:04:40.515359 7fe98cd8a760 2 journal open /var/lib/ceph/osd/ceph-0/journal fsid 0ab92be4-3b42-47bc-bd88-b0e11da5b450 fs_op_seq 2 2013-02-13 18:04:40.515373 7fe98cd8a760 10 journal _open journal is not a block device, NOT checking disk write cache on '/var/lib/ceph/osd/ceph-0/jour nal' 2013-02-13 18:04:40.515385 7fe98cd8a760 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 17: 10485760000 bytes, block size 4096 bytes, directio = 1 , aio = 0 2013-02-13 18:04:40.515393 7fe98cd8a760 10 journal read_header 2013-02-13 18:04:40.515409 7fe98cd8a760 10 journal header: block_size 4096 alignment 4096 max_size 10485760000 2013-02-13 18:04:40.515411 7fe98cd8a760 10 journal header: start 4096 2013-02-13 18:04:40.515412 7fe98cd8a760 10 journal write_pos 4096 2013-02-13 18:04:40.515415 7fe98cd8a760 10 journal open header.fsid = 0ab92be4-3b42-47bc-bd88-b0e11da5b450 2013-02-13 18:04:40.515434 7fe98cd8a760 2 journal read_entry 4096 : seq 2 424 bytes 2013-02-13 18:04:40.515439 7fe98cd8a760 2 journal read_entry 8192 : bad header magic, end of journal 2013-02-13 18:04:40.515443 7fe98cd8a760 10 journal open reached end of journal. 2013-02-13 18:04:40.515446 7fe98cd8a760 2 journal read_entry 8192 : bad header magic, end of journal 2013-02-13 18:04:40.515447 7fe98cd8a760 3 journal journal_replay: end of journal, done. 2013-02-13 18:04:40.515444 7fe989567700 20 filestore(/var/lib/ceph/osd/ceph-0) sync_entry waiting for max_interval 5.000000 2013-02-13 18:04:40.515457 7fe98cd8a760 10 journal _open journal is not a block device, NOT checking disk write cache on '/var/lib/ceph/osd/ceph-0/jour nal' 2013-02-13 18:04:40.515465 7fe98cd8a760 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 17: 10485760000 bytes, block size 4096 bytes, directio = 1 , aio = 0 2013-02-13 18:04:40.515516 7fe98cd8a760 10 journal journal_start 2013-02-13 18:04:40.515545 7fe983fff700 10 journal write_finish_thread_entry enter 2013-02-13 18:04:40.515555 7fe983fff700 20 journal write_finish_thread_entry sleeping 2013-02-13 18:04:40.515550 7fe988d66700 10 journal write_thread_entry start 2013-02-13 18:04:40.515559 7fe988d66700 20 journal write_thread_entry going to sleep 2013-02-13 18:04:40.515840 7fe981ffb700 20 filestore(/var/lib/ceph/osd/ceph-0) flusher_entry start 2013-02-13 18:04:40.515851 7fe981ffb700 20 filestore(/var/lib/ceph/osd/ceph-0) flusher_entry sleeping 2013-02-13 18:04:40.515938 7fe98cd8a760 5 filestore(/var/lib/ceph/osd/ceph-0) umount /var/lib/ceph/osd/ceph-0 2013-02-13 18:04:40.515958 7fe981ffb700 20 filestore(/var/lib/ceph/osd/ceph-0) flusher_entry awoke 2013-02-13 18:04:40.515973 7fe981ffb700 20 filestore(/var/lib/ceph/osd/ceph-0) flusher_entry finish 2013-02-13 18:04:40.515991 7fe989567700 20 filestore(/var/lib/ceph/osd/ceph-0) sync_entry force_sync set 2013-02-13 18:04:40.516007 7fe989567700 10 journal commit_start max_applied_seq 2, open_ops 0 2013-02-13 18:04:40.516011 7fe989567700 10 journal commit_start blocked, all open_ops have completed 2013-02-13 18:04:40.516012 7fe989567700 10 journal commit_start nothing to do 2013-02-13 18:04:40.516015 7fe989567700 10 journal commit_start 2013-02-13 18:04:40.516199 7fe98cd8a760 10 journal journal_stop 2013-02-13 18:04:40.516338 7fe98cd8a760 1 journal close /var/lib/ceph/osd/ceph-0/journal 2013-02-13 18:04:40.516361 7fe983fff700 10 journal write_finish_thread_entry exit 2013-02-13 18:04:40.516413 7fe988d66700 20 journal write_thread_entry woke up 2013-02-13 18:04:40.516423 7fe988d66700 10 journal write_thread_entry finish Here it is my ceph.conf: [global] auth cluster required = cephx auth service required = cephx auth client required = cephx [osd] [mon.a] host = hosta mon addr = 192.168.0.200:6789 [mon.b] host = hostb mon addr = 192.168.0.186:6789 [mon.c] host = hostc mon addr = 192.168.0.136:6789 [osd.0] host = hosta osd mkfs type=xfs devs = /dev/sdb1 filestore_xattr_use_omap = 1 [osd.1] host = hostb osd mkfs type=xfs devs = /dev/sdb1 filestore_xattr_use_omap = 1 [mds.a] host = hosta Any ideas? I began testing with ceph 0.56.1, and then upgraded to 0.56.2, hoping it might fix this strange problem. Thanks and kind regards, -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html