Re: OSD dies after seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 13 Feb 2013, Jesus Cuenca wrote:
> Hi,
> 
> I'm setting up a small ceph 0.56.2 cluster on 3 64-bit Debian 6
> servers with kernel 3.7.2.

This might be

	http://tracker.ceph.com/issues/3595

which is problems with google perftools (which we use by default) and the 
version in squeeze, which is buggy.  This doesn't seem to affect all 
squeeze users.

Does it seg fault?

sage


> 
> My problem is that OSD die. First I try to start them with the init script:
> 
> > /etc/init.d/ceph start osd.0
> ...
> starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0
> /var/lib/ceph/osd/ceph-0/journal
> 
> > ps -ef | grep ceph
> (No ceph-osd process)
> 
> I then run with debugging:
> 
> > ceph-osd -i 0 --debug_ms 20 --debug_osd 20 --debug_filestore 20 --debug_journal 20 -d
> starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0
> /var/lib/ceph/osd/ceph-0/journal
> 2013-02-13 18:04:40.351830 7fe98cd8a760 10 -- :/0 rank.bind :/0
> 2013-02-13 18:04:40.351895 7fe98cd8a760 10 accepter.accepter.bind
> 2013-02-13 18:04:40.351910 7fe98cd8a760 10 accepter.accepter.bind
> bound on random port 0.0.0.0:6800/0
> 2013-02-13 18:04:40.351919 7fe98cd8a760 10 accepter.accepter.bind
> bound to 0.0.0.0:6800/0
> 2013-02-13 18:04:40.351930 7fe98cd8a760  1 accepter.accepter.bind
> my_inst.addr is 0.0.0.0:6800/8438 need_addr=1
> 2013-02-13 18:04:40.351935 7fe98cd8a760 10 -- :/0 rank.bind :/0
> 2013-02-13 18:04:40.351938 7fe98cd8a760 10 accepter.accepter.bind
> 2013-02-13 18:04:40.351943 7fe98cd8a760 10 accepter.accepter.bind
> bound on random port 0.0.0.0:6801/0
> 2013-02-13 18:04:40.351946 7fe98cd8a760 10 accepter.accepter.bind
> bound to 0.0.0.0:6801/0
> 2013-02-13 18:04:40.351952 7fe98cd8a760  1 accepter.accepter.bind
> my_inst.addr is 0.0.0.0:6801/8438 need_addr=1
> 2013-02-13 18:04:40.351959 7fe98cd8a760 10 -- :/0 rank.bind :/0
> 2013-02-13 18:04:40.351961 7fe98cd8a760 10 accepter.accepter.bind
> 2013-02-13 18:04:40.351966 7fe98cd8a760 10 accepter.accepter.bind
> bound on random port 0.0.0.0:6802/0
> 2013-02-13 18:04:40.351969 7fe98cd8a760 10 accepter.accepter.bind
> bound to 0.0.0.0:6802/0
> 2013-02-13 18:04:40.351975 7fe98cd8a760  1 accepter.accepter.bind
> my_inst.addr is 0.0.0.0:6802/8438 need_addr=1
> 2013-02-13 18:04:40.352636 7fe98cd8a760  5
> filestore(/var/lib/ceph/osd/ceph-0) basedir /var/lib/ceph/osd/ceph-0
> journal /var/lib/ceph/osd/ceph-0/journa
> l
> 2013-02-13 18:04:40.352664 7fe98cd8a760 10
> filestore(/var/lib/ceph/osd/ceph-0) mount fsid is
> 0ab92be4-3b42-47bc-bd88-b0e11da5b450
> 2013-02-13 18:04:40.426222 7fe98cd8a760  0
> filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is supported
> and appears to work
> 2013-02-13 18:04:40.426234 7fe98cd8a760  0
> filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is disabled via
> 'filestore fiemap' config option
> 2013-02-13 18:04:40.426567 7fe98cd8a760  0
> filestore(/var/lib/ceph/osd/ceph-0) mount did NOT detect btrfs
> 2013-02-13 18:04:40.426575 7fe98cd8a760  0
> filestore(/var/lib/ceph/osd/ceph-0) mount syncfs(2) syscall not
> supported
> 2013-02-13 18:04:40.426630 7fe98cd8a760  0
> filestore(/var/lib/ceph/osd/ceph-0) mount no syncfs(2), must use
> sync(2).
> 2013-02-13 18:04:40.426631 7fe98cd8a760  0
> filestore(/var/lib/ceph/osd/ceph-0) mount WARNING: multiple ceph-osd
> daemons on the same host will be slow
> 2013-02-13 18:04:40.426701 7fe98cd8a760  0
> filestore(/var/lib/ceph/osd/ceph-0) mount found snaps <>
> 2013-02-13 18:04:40.426719 7fe98cd8a760  5
> filestore(/var/lib/ceph/osd/ceph-0) mount op_seq is 2
> 2013-02-13 18:04:40.515151 7fe98cd8a760 20 filestore (init)dbobjectmap: seq is 1
> 2013-02-13 18:04:40.515217 7fe98cd8a760 10
> filestore(/var/lib/ceph/osd/ceph-0) open_journal at
> /var/lib/ceph/osd/ceph-0/journal
> 2013-02-13 18:04:40.515243 7fe98cd8a760  0
> filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal
> mode: btrfs not detected
> 2013-02-13 18:04:40.515252 7fe98cd8a760 10
> filestore(/var/lib/ceph/osd/ceph-0) list_collections
> 2013-02-13 18:04:40.515352 7fe98cd8a760 10 journal journal_replay fs op_seq 2
> 2013-02-13 18:04:40.515359 7fe98cd8a760  2 journal open
> /var/lib/ceph/osd/ceph-0/journal fsid
> 0ab92be4-3b42-47bc-bd88-b0e11da5b450 fs_op_seq 2
> 2013-02-13 18:04:40.515373 7fe98cd8a760 10 journal _open journal is
> not a block device, NOT checking disk write cache on
> '/var/lib/ceph/osd/ceph-0/jour
> nal'
> 2013-02-13 18:04:40.515385 7fe98cd8a760  1 journal _open
> /var/lib/ceph/osd/ceph-0/journal fd 17: 10485760000 bytes, block size
> 4096 bytes, directio = 1
> , aio = 0
> 2013-02-13 18:04:40.515393 7fe98cd8a760 10 journal read_header
> 2013-02-13 18:04:40.515409 7fe98cd8a760 10 journal header: block_size
> 4096 alignment 4096 max_size 10485760000
> 2013-02-13 18:04:40.515411 7fe98cd8a760 10 journal header: start 4096
> 2013-02-13 18:04:40.515412 7fe98cd8a760 10 journal  write_pos 4096
> 2013-02-13 18:04:40.515415 7fe98cd8a760 10 journal open header.fsid =
> 0ab92be4-3b42-47bc-bd88-b0e11da5b450
> 2013-02-13 18:04:40.515434 7fe98cd8a760  2 journal read_entry 4096 :
> seq 2 424 bytes
> 2013-02-13 18:04:40.515439 7fe98cd8a760  2 journal read_entry 8192 :
> bad header magic, end of journal
> 2013-02-13 18:04:40.515443 7fe98cd8a760 10 journal open reached end of journal.
> 2013-02-13 18:04:40.515446 7fe98cd8a760  2 journal read_entry 8192 :
> bad header magic, end of journal
> 2013-02-13 18:04:40.515447 7fe98cd8a760  3 journal journal_replay: end
> of journal, done.
> 2013-02-13 18:04:40.515444 7fe989567700 20
> filestore(/var/lib/ceph/osd/ceph-0) sync_entry waiting for
> max_interval 5.000000
> 2013-02-13 18:04:40.515457 7fe98cd8a760 10 journal _open journal is
> not a block device, NOT checking disk write cache on
> '/var/lib/ceph/osd/ceph-0/jour
> nal'
> 2013-02-13 18:04:40.515465 7fe98cd8a760  1 journal _open
> /var/lib/ceph/osd/ceph-0/journal fd 17: 10485760000 bytes, block size
> 4096 bytes, directio = 1
> , aio = 0
> 2013-02-13 18:04:40.515516 7fe98cd8a760 10 journal journal_start
> 2013-02-13 18:04:40.515545 7fe983fff700 10 journal
> write_finish_thread_entry enter
> 2013-02-13 18:04:40.515555 7fe983fff700 20 journal
> write_finish_thread_entry sleeping
> 2013-02-13 18:04:40.515550 7fe988d66700 10 journal write_thread_entry start
> 2013-02-13 18:04:40.515559 7fe988d66700 20 journal write_thread_entry
> going to sleep
> 2013-02-13 18:04:40.515840 7fe981ffb700 20
> filestore(/var/lib/ceph/osd/ceph-0) flusher_entry start
> 2013-02-13 18:04:40.515851 7fe981ffb700 20
> filestore(/var/lib/ceph/osd/ceph-0) flusher_entry sleeping
> 2013-02-13 18:04:40.515938 7fe98cd8a760  5
> filestore(/var/lib/ceph/osd/ceph-0) umount /var/lib/ceph/osd/ceph-0
> 2013-02-13 18:04:40.515958 7fe981ffb700 20
> filestore(/var/lib/ceph/osd/ceph-0) flusher_entry awoke
> 2013-02-13 18:04:40.515973 7fe981ffb700 20
> filestore(/var/lib/ceph/osd/ceph-0) flusher_entry finish
> 2013-02-13 18:04:40.515991 7fe989567700 20
> filestore(/var/lib/ceph/osd/ceph-0) sync_entry force_sync set
> 2013-02-13 18:04:40.516007 7fe989567700 10 journal commit_start
> max_applied_seq 2, open_ops 0
> 2013-02-13 18:04:40.516011 7fe989567700 10 journal commit_start
> blocked, all open_ops have completed
> 2013-02-13 18:04:40.516012 7fe989567700 10 journal commit_start nothing to do
> 2013-02-13 18:04:40.516015 7fe989567700 10 journal commit_start
> 2013-02-13 18:04:40.516199 7fe98cd8a760 10 journal journal_stop
> 2013-02-13 18:04:40.516338 7fe98cd8a760  1 journal close
> /var/lib/ceph/osd/ceph-0/journal
> 2013-02-13 18:04:40.516361 7fe983fff700 10 journal
> write_finish_thread_entry exit
> 2013-02-13 18:04:40.516413 7fe988d66700 20 journal write_thread_entry woke up
> 2013-02-13 18:04:40.516423 7fe988d66700 10 journal write_thread_entry finish
> 
> Here it is my ceph.conf:
> 
> [global]
>         auth cluster required = cephx
>         auth service required = cephx
>         auth client required = cephx
> 
> [osd]
> 
> [mon.a]
>         host = hosta
>         mon addr = 192.168.0.200:6789
> 
> [mon.b]
>         host = hostb
>         mon addr = 192.168.0.186:6789
> 
> [mon.c]
>         host = hostc
>         mon addr = 192.168.0.136:6789
> 
> [osd.0]
>         host = hosta
>         osd mkfs type=xfs
>         devs = /dev/sdb1
>         filestore_xattr_use_omap = 1
> 
> [osd.1]
>         host = hostb
>         osd mkfs type=xfs
>         devs = /dev/sdb1
>         filestore_xattr_use_omap = 1
> 
> [mds.a]
>         host = hosta
> 
> Any ideas?
> 
> I began testing with ceph 0.56.1, and then upgraded to 0.56.2, hoping
> it might fix this strange problem.
> 
> Thanks and kind regards,
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux