Part 2: ssd osd fails often with "FAILED assert(soid < scrubber.start || soid >= scrubber.end)"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi again,
sorry for not threaded, but my last email don't came back on the mailing
list (often miss some posts!).

Just after sending the last mail, the first time another SSD fails - in
this case an cheap one, but with the same error:

root@ceph-04:/var/log/ceph# more ceph-osd.62.log
2015-01-13 16:40:55.712967 7fb29cfd3700  0 log [INF] : 17.2 scrub ok
2015-01-13 17:54:35.548361 7fb29dfd5700  0 log [INF] : 17.3 scrub ok
2015-01-13 17:54:38.007014 7fb29dfd5700  0 log [INF] : 17.5 scrub ok
2015-01-13 17:54:41.215558 7fb29d7d4700  0 log [INF] : 17.f scrub ok
2015-01-13 17:54:42.277585 7fb29dfd5700  0 log [INF] : 17.a scrub ok
2015-01-13 17:54:48.961582 7fb29d7d4700  0 log [INF] : 17.6 scrub ok
2015-01-13 20:15:08.749597 7fb292337700  0 -- 192.168.3.14:6824/9185 >>
192.168.3.15:6824/11735 pipe(0x107d9680 sd=307 :6824 s=2 pgs=2 cs=1
l=0 c=0x124a09a0).fault, initiating reconnect
2015-01-13 20:15:08.750803 7fb296dbe700  0 -- 192.168.3.14:0/9185 >>
192.168.3.15:6825/11735 pipe(0xd011180 sd=42 :0 s=1 pgs=0 cs=0 l=1 c=0x
8d19760).fault
2015-01-13 20:15:08.750804 7fb292b3f700  0 -- 192.168.3.14:0/9185 >>
172.20.2.15:6837/11735 pipe(0x1210f900 sd=66 :0 s=1 pgs=0 cs=0 l=1 c=0x
beae840).fault
2015-01-13 20:15:08.751056 7fb291d31700  0 -- 192.168.3.14:6824/9185 >>
192.168.3.15:6824/11735 pipe(0x107d9680 sd=29 :6824 s=1 pgs=2 cs=2 l
=0 c=0x124a09a0).fault
2015-01-13 20:15:27.035342 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:07.035339)
2015-01-13 20:15:28.036773 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:08.036769)
2015-01-13 20:15:28.945179 7fb29b7d0700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:08.945178)
2015-01-13 20:15:29.037016 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:09.037014)
2015-01-13 20:15:30.037204 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:10.037202)
2015-01-13 20:15:30.645491 7fb29b7d0700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:10.645483)
2015-01-13 20:15:31.037326 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:11.037323)
2015-01-13 20:15:32.037442 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:12.037439)
2015-01-13 20:15:33.037641 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:13.037637)
2015-01-13 20:15:34.037843 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:14.037839)
2015-01-13 21:39:35.241153 7fb29dfd5700  0 log [INF] : 17.d scrub ok
2015-01-13 21:39:39.293113 7fb29a7ce700 -1 osd/ReplicatedPG.cc: In
function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bo
ol)' thread 7fb29a7ce700 time 2015-01-13 21:39:39.279799
osd/ReplicatedPG.cc: 5306: FAILED assert(soid < scrubber.start || soid
>= scrubber.end)

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int,
bool)+0x1320) [0x9296b0]
 2:
(ReplicatedPG::try_flush_mark_clean(boost::shared_ptr<ReplicatedPG::FlushOp>)+0x5f6)
[0x92b076]
 3: (ReplicatedPG::finish_flush(hobject_t, unsigned long, int)+0x296)
[0x92b876]
 4: (C_Flush::finish(int)+0x86) [0x986226]
 5: (Context::complete(int)+0x9) [0x78f449]
 6: (Finisher::finisher_thread_entry()+0x1c8) [0xad5a18]
 7: (()+0x6b50) [0x7fb2b94ceb50]
 8: (clone()+0x6d) [0x7fb2b80dc7bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
  -127> 2015-01-10 19:39:41.861724 7fb2b9faa780  5 asok(0x28e4230)
register_command perfcounters_dump hook 0x28d4010
  -126> 2015-01-10 19:39:41.861749 7fb2b9faa780  5 asok(0x28e4230)
register_command 1 hook 0x28d4010
  -125> 2015-01-10 19:39:41.861753 7fb2b9faa780  5 asok(0x28e4230)
register_command perf dump hook 0x28d4010
  -124> 2015-01-10 19:39:41.861756 7fb2b9faa780  5 asok(0x28e4230)
register_command perfcounters_schema hook 0x28d4010
  -123> 2015-01-10 19:39:41.861759 7fb2b9faa780  5 asok(0x28e4230)
register_command 2 hook 0x28d4010
  -122> 2015-01-10 19:39:41.861762 7fb2b9faa780  5 asok(0x28e4230)
register_command perf schema hook 0x28d4010
  -121> 2015-01-10 19:39:41.861764 7fb2b9faa780  5 asok(0x28e4230)
register_command config show hook 0x28d4010
  -120> 2015-01-10 19:39:41.861768 7fb2b9faa780  5 asok(0x28e4230)
register_command config set hook 0x28d4010
  -119> 2015-01-10 19:39:41.861773 7fb2b9faa780  5 asok(0x28e4230)
register_command config get hook 0x28d4010
  -118> 2015-01-10 19:39:41.861779 7fb2b9faa780  5 asok(0x28e4230)
register_command log flush hook 0x28d4010
  -117> 2015-01-10 19:39:41.861784 7fb2b9faa780  5 asok(0x28e4230)
register_command log dump hook 0x28d4010
  -116> 2015-01-10 19:39:41.861789 7fb2b9faa780  5 asok(0x28e4230)
register_command log reopen hook 0x28d4010
  -115> 2015-01-10 19:39:41.864385 7fb2b9faa780  0 ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3), process ceph-osd, pid 918
5
  -114> 2015-01-10 19:39:41.873624 7fb2b9faa780  1 finished
global_init_daemonize
  -113> 2015-01-10 19:39:41.892039 7fb2b9faa780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-62) detect_features:
FIEMAP ioctl is suppo
rted and appears to work
  -112> 2015-01-10 19:39:41.892081 7fb2b9faa780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-62) detect_features:
FIEMAP ioctl is disab
led via 'filestore fiemap' config option
  -111> 2015-01-10 19:39:41.902334 7fb2b9faa780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-62) detect_features:
syscall(SYS_syncfs, f
d) fully supported
  -110> 2015-01-10 19:39:41.983875 7fb2b9faa780  0
filestore(/var/lib/ceph/osd/ceph-62) limited size xattrs
  -109> 2015-01-10 19:39:42.112708 7fb2b9faa780  0
filestore(/var/lib/ceph/osd/ceph-62) mount: enabling WRITEAHEAD journal
mode: checkpoint

Udo
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux