osd dies on pg repair with FAILED assert(!out->snaps.empty())

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Cephers!

trying to repair an inconsistent PG results in the osd dying with an
assertion failure:

     0> 2015-12-01 07:22:13.398006 7f76d6594700 -1 osd/SnapMapper.cc:
In function 'int SnapMapper::get_snaps(const hobject_t&
, SnapMapper::object_snaps*)' thread 7f76d6594700 time 2015-12-01
07:22:13.394900
osd/SnapMapper.cc: 153: FAILED assert(!out->snaps.empty())

 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xbc60eb]
 2: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x40c) [0x72aecc]
 3: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> >*)+0xa2) [0x72
b062]
 4: (PG::_scan_snaps(ScrubMap&)+0x454) [0x7f2f84]
 5: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x218) [0x7f3ba8]
 6: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f9da0]
 7: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7fb48e]
 8: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x6cdbf9]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 10: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 11: (()+0x8182) [0x7f76fe072182]
 12: (clone()+0x6d) [0x7f76fc5dd47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.339.log
--- end dump of recent events ---
2015-12-01 07:22:13.476525 7f76d6594700 -1 *** Caught signal (Aborted) **
 in thread 7f76d6594700

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7f76fe07a340]
 3: (gsignal()+0x39) [0x7f76fc519cc9]
 4: (abort()+0x148) [0x7f76fc51d0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f76fce24535]
 6: (()+0x5e6d6) [0x7f76fce226d6]
 7: (()+0x5e703) [0x7f76fce22703]
 8: (()+0x5e922) [0x7f76fce22922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x278) [0xbc62d8]
 10: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x40c) [0x72aecc]
 11: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> >*)+0xa2) [0x72b062]
 12: (PG::_scan_snaps(ScrubMap&)+0x454) [0x7f2f84]
 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x218) [0x7f3ba8]
 14: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f9da0]
 15: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7fb48e]
 16: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x6cdbf9]
 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 18: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 19: (()+0x8182) [0x7f76fe072182]
 20: (clone()+0x6d) [0x7f76fc5dd47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
    -4> 2015-12-01 07:22:13.403280 7f76e4db1700  1 --
10.9.246.104:6887/8548 <== osd.109 10.9.245.204:0/3407 13 ====
osd_ping(ping e320057 stamp 2015-12-01 07:22:13.399779) v2 ==== 47+0+0
(1340520147 0 0) 0x22456800 con 0x22340b00
    -3> 2015-12-01 07:22:13.403313 7f76e4db1700  1 --
10.9.246.104:6887/8548 --> 10.9.245.204:0/3407 -- osd_ping(ping_reply
e320057 stamp 2015-12-01 07:22:13.399779) v2 -- ?+0 0x23e3be00 con
0x22340b00
    -2> 2015-12-01 07:22:13.403365 7f76e35ae700  1 --
10.9.246.104:6883/8548 <== osd.109 10.9.245.204:0/3407 13 ====
osd_ping(ping e320057 stamp 2015-12-01 07:22:13.399779) v2 ==== 47+0+0
(1340520147 0 0) 0x22457600 con 0x22570d60
    -1> 2015-12-01 07:22:13.403405 7f76e35ae700  1 --
10.9.246.104:6883/8548 --> 10.9.245.204:0/3407 -- osd_ping(ping_reply
e320057 stamp 2015-12-01 07:22:13.399779) v2 -- ?+0 0x23e3fe00 con
0x22570d60
     0> 2015-12-01 07:22:13.476525 7f76d6594700 -1 *** Caught signal
(Aborted) **
 in thread 7f76d6594700
 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7f76fe07a340]
 3: (gsignal()+0x39) [0x7f76fc519cc9]
 4: (abort()+0x148) [0x7f76fc51d0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f76fce24535]
 6: (()+0x5e6d6) [0x7f76fce226d6]
 7: (()+0x5e703) [0x7f76fce22703]
 8: (()+0x5e922) [0x7f76fce22922]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x278) [0xbc62d8]
 10: (SnapMapper::get_snaps(hobject_t const&,
SnapMapper::object_snaps*)+0x40c) [0x72aecc]
 11: (SnapMapper::get_snaps(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> >*)+0xa2) [0x72b062]
 12: (PG::_scan_snaps(ScrubMap&)+0x454) [0x7f2f84]
 13: (PG::build_scrub_map_chunk(ScrubMap&, hobject_t, hobject_t, bool,
unsigned int, ThreadPool::TPHandle&)+0x218) [0x7f3ba8]
 14: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x480) [0x7f9da0]
 15: (PG::scrub(ThreadPool::TPHandle&)+0x2ee) [0x7fb48e]
 16: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x6cdbf9]
 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e]
 18: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0]
 19: (()+0x8182) [0x7f76fe072182]
 20: (clone()+0x6d) [0x7f76fc5dd47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.339.log
--- end dump of recent events ---

2015-12-01 07:22:13.889279 7f0be9daf900  0 ceph version 0.94.5
(9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid
12810
2015-12-01 07:22:13.904298 7f0be9daf900  0
filestore(/var/lib/ceph/osd/ceph-339) backend xfs (magic 0x58465342)

As it mentioned snapshots i generously deleted some and re-repaired.
This took 2 other boxes out of operation with kernel_hung_tasks of
ceph-osds waiting for xfs_fs_sync and load ~10000. Thankfully
power-cycling those was enough.

osd-339 is now much more chattybefore dying:
http://www.traced.net/u/toasta/tmp/ceph-osd.339.log.txt

How do I get this pg to cooperate again?
Is it safe to just delete it from the filesystem and let it repair
(from one of the replicas)?

Thx in advance
  Benedikt
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux