Le lundi 19 août 2013 à 12:27 +0200, Olivier Bonvalet a écrit : > Hi, > > I have an OSD which crash every time I try to start it (see logs below). > Is it a known problem ? And is there a way to fix it ? > > root! taman:/var/log/ceph# grep -v ' pipe' osd.65.log > 2013-08-19 11:07:48.478558 7f6fe367a780 0 ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 19327 > 2013-08-19 11:07:48.516363 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount FIEMAP ioctl is supported and appears to work > 2013-08-19 11:07:48.516380 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option > 2013-08-19 11:07:48.516514 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount did NOT detect btrfs > 2013-08-19 11:07:48.517087 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount syscall(SYS_syncfs, fd) fully supported > 2013-08-19 11:07:48.517389 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount found snaps <> > 2013-08-19 11:07:49.199483 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount: enabling WRITEAHEAD journal mode: btrfs not detected > 2013-08-19 11:07:52.191336 7f6fe367a780 1 journal _open /dev/sdk4 fd 18: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1 > 2013-08-19 11:07:52.196020 7f6fe367a780 1 journal _open /dev/sdk4 fd 18: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1 > 2013-08-19 11:07:52.196920 7f6fe367a780 1 journal close /dev/sdk4 > 2013-08-19 11:07:52.199908 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount FIEMAP ioctl is supported and appears to work > 2013-08-19 11:07:52.199916 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option > 2013-08-19 11:07:52.200058 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount did NOT detect btrfs > 2013-08-19 11:07:52.200886 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount syscall(SYS_syncfs, fd) fully supported > 2013-08-19 11:07:52.200919 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount found snaps <> > 2013-08-19 11:07:52.215850 7f6fe367a780 0 filestore(/var/lib/ceph/osd/ceph-65) mount: enabling WRITEAHEAD journal mode: btrfs not detected > 2013-08-19 11:07:52.219819 7f6fe367a780 1 journal _open /dev/sdk4 fd 26: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1 > 2013-08-19 11:07:52.227420 7f6fe367a780 1 journal _open /dev/sdk4 fd 26: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1 > 2013-08-19 11:07:52.500342 7f6fe367a780 0 osd.65 144201 crush map has features 262144, adjusting msgr requires for clients > 2013-08-19 11:07:52.500353 7f6fe367a780 0 osd.65 144201 crush map has features 262144, adjusting msgr requires for osds > 2013-08-19 11:08:13.581709 7f6fbdcb5700 -1 osd/OSD.cc: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f6fbdcb5700 time 2013-08-19 11:08:13.579519 > osd/OSD.cc: 4844: FAILED assert(_get_map_bl(epoch, bl)) > > ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff) > 1: (OSDService::get_map(unsigned int)+0x44b) [0x6f5b9b] > 2: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >*)+0x3c8) [0x6f8f48] > 3: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x31f) [0x6f975f] > 4: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x14) [0x7391d4] > 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x8f8e3a] > 6: (ThreadPool::WorkThread::entry()+0x10) [0x8fa0e0] > 7: (()+0x6b50) [0x7f6fe3070b50] > 8: (clone()+0x6d) [0x7f6fe15cba7d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > full logs here : http://pastebin.com/RphNyLU0 > > Hi, still same problem with Ceph 0.61.8 : 2013-08-19 23:01:54.369609 7fdd667a4780 0 osd.65 144279 crush map has features 262144, adjusting msgr requires for osds 2013-08-19 23:01:58.315115 7fdd405de700 -1 osd/OSD.cc: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fdd405de700 time 2013-08-19 23:01:58.313955 osd/OSD.cc: 4847: FAILED assert(_get_map_bl(epoch, bl)) ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b) 1: (OSDService::get_map(unsigned int)+0x44b) [0x6f736b] 2: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >*)+0x3c8) [0x6fa708] 3: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x31f) [0x6faf1f] 4: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x14) [0x73a9b4] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x8fb69a] 6: (ThreadPool::WorkThread::entry()+0x10) [0x8fc940] 7: (()+0x6b50) [0x7fdd6619ab50] 8: (clone()+0x6d) [0x7fdd646f5a7d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. (It's on Debian Wheezy, with a 3.10.5 kernel) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com