Capture a log with debug_osd at 30 (yes, that's correct, 30) and see if that sheds more light on the issue. On Tue, Feb 14, 2017 at 6:53 AM, Alfredo Colangelo <acolangelo1@xxxxxxxxx> wrote: > Hi Ceph experts, > > after updating from ceph 0.94.9 to ceph 10.2.5 on Ubuntu 14.04, 2 out of 3 > osd processes are unable to start. On another machine the same happened but > only on 1 out of 3 OSDs. > > The update procedure is done via ceph-deploy 1.5.37. > > Shouldn’t be a permissions problem, because before updating I do a chown > 64045: 64045 on the osd disks /dev/sd[bcd] and on the (separate) journal > partition on ssd /dev/sda[678] > > When upgrade procedure is completed the 3 ceph osd processes are still > running, but if I restart them some of them refuses to start. > > > > The error in /var/log/ceph/ceph-osd.271.log is full of errors like this : > > > > 2017-02-13 09:47:17.590843 7fc57248f800 0 set uid:gid to 1001:1001 > (ceph:ceph) > > 2017-02-13 09:47:17.590859 7fc57248f800 0 ceph version 10.2.5 > (c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-osd, pid 187128 > > 2017-02-13 09:47:17.591356 7fc57248f800 0 pidfile_write: ignore empty > --pid-file > > 2017-02-13 09:47:17.601186 7fc57248f800 0 > filestore(/var/lib/ceph/osd/ceph-271) backend xfs (magic 0x58465342) > > 2017-02-13 09:47:17.601530 7fc57248f800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-271) detect_features: FIEMAP > ioctl is disabled via 'filestore fiemap' config option > > 2017-02-13 09:47:17.601539 7fc57248f800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-271) detect_features: > SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option > > 2017-02-13 09:47:17.601553 7fc57248f800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-271) detect_features: splice > is supported > > 2017-02-13 09:47:17.613611 7fc57248f800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-271) detect_features: > syncfs(2) syscall fully supported (by glibc and kernel) > > 2017-02-13 09:47:17.613673 7fc57248f800 0 > xfsfilestorebackend(/var/lib/ceph/osd/ceph-271) detect_feature: extsize is > disabled by conf > > 2017-02-13 09:47:17.614454 7fc57248f800 1 leveldb: Recovering log #6754 > > 2017-02-13 09:47:17.672544 7fc57248f800 1 leveldb: Delete type=3 #6753 > > > > 2017-02-13 09:47:17.672662 7fc57248f800 1 leveldb: Delete type=0 #6754 > > > > 2017-02-13 09:47:17.673640 7fc57248f800 0 > filestore(/var/lib/ceph/osd/ceph-271) mount: enabling WRITEAHEAD journal > mode: checkpoint is not enabled > > 2017-02-13 09:47:17.684464 7fc57248f800 0 <cls> cls/hello/cls_hello.cc:305: > loading cls_hello > > 2017-02-13 09:47:17.688815 7fc57248f800 0 <cls> > cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan > > 2017-02-13 09:47:17.694483 7fc57248f800 -1 osd/OSD.h: In function 'OSDMapRef > OSDService::get_map(epoch_t)' thread 7fc57248f800 time 2017-02-13 > 09:47:17.692735 > > osd/OSD.h: 885: FAILED assert(ret) > > > > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x8b) [0x55ea51744dab] > > 2: (OSDService::get_map(unsigned int)+0x3d) [0x55ea5114debd] > > 3: (OSD::init()+0x1ed2) [0x55ea51103872] > > 4: (main()+0x29d1) [0x55ea5106ae41] > > 5: (__libc_start_main()+0xf5) [0x7fc56f3b0f45] > > 6: (()+0x355b17) [0x55ea510b3b17] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > > --- begin dump of recent events --- > > -29> 2017-02-13 09:47:17.587145 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command perfcounters_dump hook 0x55ea5d1d8050 > > -28> 2017-02-13 09:47:17.587164 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command 1 hook 0x55ea5d1d8050 > > -27> 2017-02-13 09:47:17.587166 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command perf dump hook 0x55ea5d1d8050 > > -26> 2017-02-13 09:47:17.587168 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command perfcounters_schema hook 0x55ea5d1d8050 > > -25> 2017-02-13 09:47:17.587170 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command 2 hook 0x55ea5d1d8050 > > -24> 2017-02-13 09:47:17.587172 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command perf schema hook 0x55ea5d1d8050 > > -23> 2017-02-13 09:47:17.587174 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command perf reset hook 0x55ea5d1d8050 > > -22> 2017-02-13 09:47:17.587176 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command config show hook 0x55ea5d1d8050 > > -21> 2017-02-13 09:47:17.587178 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command config set hook 0x55ea5d1d8050 > > -20> 2017-02-13 09:47:17.587181 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command config get hook 0x55ea5d1d8050 > > -19> 2017-02-13 09:47:17.587187 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command config diff hook 0x55ea5d1d8050 > > -18> 2017-02-13 09:47:17.587189 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command log flush hook 0x55ea5d1d8050 > > -17> 2017-02-13 09:47:17.587191 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command log dump hook 0x55ea5d1d8050 > > -16> 2017-02-13 09:47:17.587195 7fc57248f800 5 asok(0x55ea5d1f8280) > register_command log reopen hook 0x55ea5d1d8050 > > -15> 2017-02-13 09:47:17.590843 7fc57248f800 0 set uid:gid to 1001:1001 > (ceph:ceph) > > -14> 2017-02-13 09:47:17.590859 7fc57248f800 0 ceph version 10.2.5 > (c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-osd, pid 187128 > > -13> 2017-02-13 09:47:17.591356 7fc57248f800 0 pidfile_write: ignore > empty --pid-file > > -12> 2017-02-13 09:47:17.601186 7fc57248f800 0 > filestore(/var/lib/ceph/osd/ceph-271) backend xfs (magic 0x58465342) > > -11> 2017-02-13 09:47:17.601530 7fc57248f800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-271) detect_features: FIEMAP > ioctl is disabled via 'filestore fiemap' config option > > -10> 2017-02-13 09:47:17.601539 7fc57248f800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-271) detect_features: > SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option > > -9> 2017-02-13 09:47:17.601553 7fc57248f800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-271) detect_features: splice > is supported > > -8> 2017-02-13 09:47:17.613611 7fc57248f800 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-271) detect_features: > syncfs(2) syscall fully supported (by glibc and kernel) > > -7> 2017-02-13 09:47:17.613673 7fc57248f800 0 > xfsfilestorebackend(/var/lib/ceph/osd/ceph-271) detect_feature: extsize is > disabled by conf > > -6> 2017-02-13 09:47:17.614454 7fc57248f800 1 leveldb: Recovering log > #6754 > > -5> 2017-02-13 09:47:17.672544 7fc57248f800 1 leveldb: Delete type=3 > #6753 > > > > -4> 2017-02-13 09:47:17.672662 7fc57248f800 1 leveldb: Delete type=0 > #6754 > > > > -3> 2017-02-13 09:47:17.673640 7fc57248f800 0 > filestore(/var/lib/ceph/osd/ceph-271) mount: enabling WRITEAHEAD journal > mode: checkpoint is not enabled > > -2> 2017-02-13 09:47:17.684464 7fc57248f800 0 <cls> > cls/hello/cls_hello.cc:305: loading cls_hello > > -1> 2017-02-13 09:47:17.688815 7fc57248f800 0 <cls> > cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan > > 0> 2017-02-13 09:47:17.694483 7fc57248f800 -1 osd/OSD.h: In function > 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fc57248f800 time 2017-02-13 > 09:47:17.692735 > > osd/OSD.h: 885: FAILED assert(ret) > > > > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x8b) [0x55ea51744dab] > > 2: (OSDService::get_map(unsigned int)+0x3d) [0x55ea5114debd] > > 3: (OSD::init()+0x1ed2) [0x55ea51103872] > > 4: (main()+0x29d1) [0x55ea5106ae41] > > 5: (__libc_start_main()+0xf5) [0x7fc56f3b0f45] > > 6: (()+0x355b17) [0x55ea510b3b17] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > > --- logging levels --- > > 0/ 5 none > > 0/ 0 lockdep > > 0/ 0 context > > 0/ 0 crush > > 1/ 5 mds > > 1/ 5 mds_balancer > > 1/ 5 mds_locker > > 1/ 5 mds_log > > 1/ 5 mds_log_expire > > 1/ 5 mds_migrator > > 0/ 0 buffer > > 0/ 0 timer > > 0/ 0 filer > > 0/ 1 striper > > 0/ 0 objecter > > 0/ 0 rados > > 0/ 0 rbd > > 0/ 5 rbd_mirror > > 0/ 5 rbd_replay > > 0/ 0 journaler > > 0/ 5 objectcacher > > 0/ 0 client > > 0/ 0 osd > > 0/ 0 optracker > > 0/ 0 objclass > > 0/ 0 filestore > > 0/ 0 journal > > 0/ 0 ms > > 0/ 0 mon > > 0/ 0 monc > > 0/ 0 paxos > > 0/ 0 tp > > 0/ 0 auth > > 1/ 5 crypto > > 0/ 0 finisher > > 0/ 0 heartbeatmap > > 0/ 0 perfcounter > > 0/ 0 rgw > > 1/10 civetweb > > 1/ 5 javaclient > > 0/ 0 asok > > 0/ 0 throttle > > 0/ 0 refs > > 1/ 5 xio > > 1/ 5 compressor > > 1/ 5 newstore > > 1/ 5 bluestore > > 1/ 5 bluefs > > 1/ 3 bdev > > 1/ 5 kstore > > 4/ 5 rocksdb > > 4/ 5 leveldb > > 1/ 5 kinetic > > 1/ 5 fuse > > -2/-2 (syslog threshold) > > -1/-1 (stderr threshold) > > max_recent 10000 > > max_new 1000 > > log_file /var/log/ceph/ceph-osd.271.log > > --- end dump of recent events --- > > 2017-02-13 09:47:17.696962 7fc57248f800 -1 *** Caught signal (Aborted) ** > > in thread 7fc57248f800 thread_name:ceph-osd > > > > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) > > 1: (()+0x8f2d32) [0x55ea51650d32] > > 2: (()+0x10330) [0x7fc571366330] > > 3: (gsignal()+0x37) [0x7fc56f3c5c37] > > 4: (abort()+0x148) [0x7fc56f3c9028] > > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x265) [0x55ea51744f85] > > 6: (OSDService::get_map(unsigned int)+0x3d) [0x55ea5114debd] > > 7: (OSD::init()+0x1ed2) [0x55ea51103872] > > 8: (main()+0x29d1) [0x55ea5106ae41] > > 9: (__libc_start_main()+0xf5) [0x7fc56f3b0f45] > > 10: (()+0x355b17) [0x55ea510b3b17] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > > --- begin dump of recent events --- > > 0> 2017-02-13 09:47:17.696962 7fc57248f800 -1 *** Caught signal > (Aborted) ** > > in thread 7fc57248f800 thread_name:ceph-osd > > > > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) > > 1: (()+0x8f2d32) [0x55ea51650d32] > > 2: (()+0x10330) [0x7fc571366330] > > 3: (gsignal()+0x37) [0x7fc56f3c5c37] > > 4: (abort()+0x148) [0x7fc56f3c9028] > > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x265) [0x55ea51744f85] > > 6: (OSDService::get_map(unsigned int)+0x3d) [0x55ea5114debd] > > 7: (OSD::init()+0x1ed2) [0x55ea51103872] > > 8: (main()+0x29d1) [0x55ea5106ae41] > > 9: (__libc_start_main()+0xf5) [0x7fc56f3b0f45] > > 10: (()+0x355b17) [0x55ea510b3b17] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > > --- logging levels --- > > 0/ 5 none > > 0/ 0 lockdep > > 0/ 0 context > > 0/ 0 crush > > 1/ 5 mds > > 1/ 5 mds_balancer > > 1/ 5 mds_locker > > 1/ 5 mds_log > > 1/ 5 mds_log_expire > > 1/ 5 mds_migrator > > 0/ 0 buffer > > 0/ 0 timer > > 0/ 0 filer > > 0/ 1 striper > > 0/ 0 objecter > > 0/ 0 rados > > 0/ 0 rbd > > 0/ 5 rbd_mirror > > 0/ 5 rbd_replay > > 0/ 0 journaler > > 0/ 5 objectcacher > > 0/ 0 client > > 0/ 0 osd > > 0/ 0 optracker > > 0/ 0 objclass > > 0/ 0 filestore > > 0/ 0 journal > > 0/ 0 ms > > 0/ 0 mon > > 0/ 0 monc > > 0/ 0 paxos > > 0/ 0 tp > > 0/ 0 auth > > 1/ 5 crypto > > 0/ 0 finisher > > 0/ 0 heartbeatmap > > 0/ 0 perfcounter > > 0/ 0 rgw > > 1/10 civetweb > > 1/ 5 javaclient > > 0/ 0 asok > > 0/ 0 throttle > > 0/ 0 refs > > 1/ 5 xio > > 1/ 5 compressor > > 1/ 5 newstore > > 1/ 5 bluestore > > 1/ 5 bluefs > > 1/ 3 bdev > > 1/ 5 kstore > > 4/ 5 rocksdb > > 4/ 5 leveldb > > 1/ 5 kinetic > > 1/ 5 fuse > > -2/-2 (syslog threshold) > > -1/-1 (stderr threshold) > > max_recent 10000 > > max_new 1000 > > log_file /var/log/ceph/ceph-osd.271.log > > --- end dump of recent events --- > > > > > > Removing the osd disks, zapping and recreating them fixes the problem, but I > don’t think it’s a good idea to do this for 2/3 of our 300 OSDs. > > > > Any idea on: > > 1. How to avoid the problem during update > > 2. how to fix the failed disks reusing the data > > > > Thank you! > > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com