Well, there were a few bug logged around upgraded which hit a similar assert but those were fixed 2 years ago supposedly. Looks like Ubuntu 15.04 shipped Hammer (0.94.5) so presumably that's what you upgraded from. The current Jewel release is 10.2.10 - I don't know if the problem you're seeing is fixed in there but I'd upgrade to 10.2.10 and then open a tracker ticket if the problem still persists. On Thu, Oct 26, 2017 at 9:13 AM, Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx> wrote: > Hello, > > I cannot tell what was the previous version since I used the one installed > on ubuntu 15.04. Now 16.04. > > But what I can tell is that I get errors from ceph osd and mon from time to > time. The mon problems are scaring since I have to wipe the monitor and then > reinstall a new one. I cannot really understand what's going on. I have > never so many problems like after updating. > > Should I open a bug report? > > ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x80) [0x55d5d510b250] > 2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, > ceph::buffer::list*)+0x642) [0x55d5d4ade2b2] > 3: (OSD::load_pgs()+0x75a) [0x55d5d4a3383a] > 4: (OSD::init()+0x2026) [0x55d5d4a3ec46] > 5: (main()+0x2d6b) [0x55d5d49b193b] > 6: (__libc_start_main()+0xf0) [0x7f49d02e5830] > 7: (_start()+0x29) [0x55d5d49f28c9] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 rbd_mirror > 0/ 5 rbd_replay > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/10 civetweb > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > 0/ 0 refs > 1/ 5 xio > 1/ 5 compressor > 1/ 5 newstore > 1/ 5 bluestore > 1/ 5 bluefs > 1/ 3 bdev > 1/ 5 kstore > 4/ 5 rocksdb > 4/ 5 leveldb > 1/ 5 kinetic > 1/ 5 fuse > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.3.log > --- end dump of recent events --- > 2017-10-25 22:09:58.778107 7f49d36958c0 -1 *** Caught signal (Aborted) ** > in thread 7f49d36958c0 thread_name:ceph-osd > > ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) > 1: (()+0x9616ee) [0x55d5d500b6ee] > 2: (()+0x11390) [0x7f49d235e390] > 3: (gsignal()+0x38) [0x7f49d02fa428] > 4: (abort()+0x16a) [0x7f49d02fc02a] > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x26b) [0x55d5d510b43b] > 6: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, > ceph::buffer::list*)+0x642) [0x55d5d4ade2b2] > 7: (OSD::load_pgs()+0x75a) [0x55d5d4a3383a] > 8: (OSD::init()+0x2026) [0x55d5d4a3ec46] > 9: (main()+0x2d6b) [0x55d5d49b193b] > 10: (__libc_start_main()+0xf0) [0x7f49d02e5830] > 11: (_start()+0x29) [0x55d5d49f28c9] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > --- begin dump of recent events --- > 0> 2017-10-25 22:09:58.778107 7f49d36958c0 -1 *** Caught signal > (Aborted) ** > in thread 7f49d36958c0 thread_name:ceph-osd > > ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) > 1: (()+0x9616ee) [0x55d5d500b6ee] > 2: (()+0x11390) [0x7f49d235e390] > 3: (gsignal()+0x38) [0x7f49d02fa428] > 4: (abort()+0x16a) [0x7f49d02fc02a] > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x26b) [0x55d5d510b43b] > 6: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, > ceph::buffer::list*)+0x642) [0x55d5d4ade2b2] > 7: (OSD::load_pgs()+0x75a) [0x55d5d4a3383a] > 8: (OSD::init()+0x2026) [0x55d5d4a3ec46] > 9: (main()+0x2d6b) [0x55d5d49b193b] > 10: (__libc_start_main()+0xf0) [0x7f49d02e5830] > 11: (_start()+0x29) [0x55d5d49f28c9] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 rbd_mirror > 0/ 5 rbd_replay > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/10 civetweb > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > 0/ 0 refs > 1/ 5 xio > 1/ 5 compressor > 1/ 5 newstore > 1/ 5 bluestore > 1/ 5 bluefs > 1/ 3 bdev > 1/ 5 kstore > 4/ 5 rocksdb > 4/ 5 leveldb > 1/ 5 kinetic > 1/ 5 fuse > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.3.log > - > > > On 25/10/17 00:42, Christian Wuerdig wrote: > > >From which version of ceph to which other version of ceph did you > upgrade? Can you provide logs from crashing OSDs? The degraded object > percentage being larger than 100% has been reported before > (https://www.spinics.net/lists/ceph-users/msg39519.html) and looks > like it's been fixed a week or so ago: > http://tracker.ceph.com/issues/21803 > > On Mon, Oct 23, 2017 at 5:10 AM, Gonzalo Aguilar Delgado > <gaguilar@xxxxxxxxxxxxxxxxxx> wrote: > > Hello, > > Since we upgraded ceph cluster we are facing a lot of problems. Most of them > due to osd crashing. What can cause this? > > > This morning I woke up with thi message: > > > root@red-compute:~# ceph -w > cluster 9028f4da-0d77-462b-be9b-dbdf7fa57771 > health HEALTH_ERR > 1 pgs are stuck inactive for more than 300 seconds > 7 pgs inconsistent > 1 pgs stale > 1 pgs stuck stale > recovery 20266198323167232/287940 objects degraded > (7038340738753.641%) > 37154696925806626 scrub errors > too many PGs per OSD (305 > max 300) > monmap e12: 2 mons at > {blue-compute=172.16.0.119:6789/0,red-compute=172.16.0.100:6789/0} > election epoch 4986, quorum 0,1 red-compute,blue-compute > fsmap e913: 1/1/1 up {0=blue-compute=up:active} > osdmap e8096: 5 osds: 5 up, 5 in > flags require_jewel_osds > pgmap v68755349: 764 pgs, 6 pools, 558 GB data, 140 kobjects > 1119 GB used, 3060 GB / 4179 GB avail > 20266198323167232/287940 objects degraded (7038340738753.641%) > 756 active+clean > 7 active+clean+inconsistent > 1 stale+active+clean > client io 1630 B/s rd, 552 kB/s wr, 0 op/s rd, 64 op/s wr > > 2017-10-22 18:10:13.000812 mon.0 [INF] pgmap v68755348: 764 pgs: 7 > active+clean+inconsistent, 756 active+clean, 1 stale+active+clean; 558 GB > data, 1119 GB used, 3060 GB / 4179 GB avail; 1641 B/s rd, 229 kB/s wr, 39 > op/s; 20266198323167232/287940 objects degraded (7038340738753.641%) > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com