Hi, This is http://tracker.ceph.com/issues/8011 which is being backported. Cheers On 13/01/2015 22:00, Udo Lembke wrote: > Hi again, > sorry for not threaded, but my last email don't came back on the mailing > list (often miss some posts!). > > Just after sending the last mail, the first time another SSD fails - in > this case an cheap one, but with the same error: > > root@ceph-04:/var/log/ceph# more ceph-osd.62.log > 2015-01-13 16:40:55.712967 7fb29cfd3700 0 log [INF] : 17.2 scrub ok > 2015-01-13 17:54:35.548361 7fb29dfd5700 0 log [INF] : 17.3 scrub ok > 2015-01-13 17:54:38.007014 7fb29dfd5700 0 log [INF] : 17.5 scrub ok > 2015-01-13 17:54:41.215558 7fb29d7d4700 0 log [INF] : 17.f scrub ok > 2015-01-13 17:54:42.277585 7fb29dfd5700 0 log [INF] : 17.a scrub ok > 2015-01-13 17:54:48.961582 7fb29d7d4700 0 log [INF] : 17.6 scrub ok > 2015-01-13 20:15:08.749597 7fb292337700 0 -- 192.168.3.14:6824/9185 >> > 192.168.3.15:6824/11735 pipe(0x107d9680 sd=307 :6824 s=2 pgs=2 cs=1 > l=0 c=0x124a09a0).fault, initiating reconnect > 2015-01-13 20:15:08.750803 7fb296dbe700 0 -- 192.168.3.14:0/9185 >> > 192.168.3.15:6825/11735 pipe(0xd011180 sd=42 :0 s=1 pgs=0 cs=0 l=1 c=0x > 8d19760).fault > 2015-01-13 20:15:08.750804 7fb292b3f700 0 -- 192.168.3.14:0/9185 >> > 172.20.2.15:6837/11735 pipe(0x1210f900 sd=66 :0 s=1 pgs=0 cs=0 l=1 c=0x > beae840).fault > 2015-01-13 20:15:08.751056 7fb291d31700 0 -- 192.168.3.14:6824/9185 >> > 192.168.3.15:6824/11735 pipe(0x107d9680 sd=29 :6824 s=1 pgs=2 cs=2 l > =0 c=0x124a09a0).fault > 2015-01-13 20:15:27.035342 7fb2b3edd700 -1 osd.62 116422 > heartbeat_check: no reply from osd.61 since back 2015-01-13 > 20:15:06.843259 front 2 > 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:07.035339) > 2015-01-13 20:15:28.036773 7fb2b3edd700 -1 osd.62 116422 > heartbeat_check: no reply from osd.61 since back 2015-01-13 > 20:15:06.843259 front 2 > 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:08.036769) > 2015-01-13 20:15:28.945179 7fb29b7d0700 -1 osd.62 116422 > heartbeat_check: no reply from osd.61 since back 2015-01-13 > 20:15:06.843259 front 2 > 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:08.945178) > 2015-01-13 20:15:29.037016 7fb2b3edd700 -1 osd.62 116422 > heartbeat_check: no reply from osd.61 since back 2015-01-13 > 20:15:06.843259 front 2 > 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:09.037014) > 2015-01-13 20:15:30.037204 7fb2b3edd700 -1 osd.62 116422 > heartbeat_check: no reply from osd.61 since back 2015-01-13 > 20:15:06.843259 front 2 > 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:10.037202) > 2015-01-13 20:15:30.645491 7fb29b7d0700 -1 osd.62 116422 > heartbeat_check: no reply from osd.61 since back 2015-01-13 > 20:15:06.843259 front 2 > 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:10.645483) > 2015-01-13 20:15:31.037326 7fb2b3edd700 -1 osd.62 116422 > heartbeat_check: no reply from osd.61 since back 2015-01-13 > 20:15:06.843259 front 2 > 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:11.037323) > 2015-01-13 20:15:32.037442 7fb2b3edd700 -1 osd.62 116422 > heartbeat_check: no reply from osd.61 since back 2015-01-13 > 20:15:06.843259 front 2 > 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:12.037439) > 2015-01-13 20:15:33.037641 7fb2b3edd700 -1 osd.62 116422 > heartbeat_check: no reply from osd.61 since back 2015-01-13 > 20:15:06.843259 front 2 > 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:13.037637) > 2015-01-13 20:15:34.037843 7fb2b3edd700 -1 osd.62 116422 > heartbeat_check: no reply from osd.61 since back 2015-01-13 > 20:15:06.843259 front 2 > 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:14.037839) > 2015-01-13 21:39:35.241153 7fb29dfd5700 0 log [INF] : 17.d scrub ok > 2015-01-13 21:39:39.293113 7fb29a7ce700 -1 osd/ReplicatedPG.cc: In > function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bo > ol)' thread 7fb29a7ce700 time 2015-01-13 21:39:39.279799 > osd/ReplicatedPG.cc: 5306: FAILED assert(soid < scrubber.start || soid >> = scrubber.end) > > ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) > 1: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, > bool)+0x1320) [0x9296b0] > 2: > (ReplicatedPG::try_flush_mark_clean(boost::shared_ptr<ReplicatedPG::FlushOp>)+0x5f6) > [0x92b076] > 3: (ReplicatedPG::finish_flush(hobject_t, unsigned long, int)+0x296) > [0x92b876] > 4: (C_Flush::finish(int)+0x86) [0x986226] > 5: (Context::complete(int)+0x9) [0x78f449] > 6: (Finisher::finisher_thread_entry()+0x1c8) [0xad5a18] > 7: (()+0x6b50) [0x7fb2b94ceb50] > 8: (clone()+0x6d) [0x7fb2b80dc7bd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- begin dump of recent events --- > -127> 2015-01-10 19:39:41.861724 7fb2b9faa780 5 asok(0x28e4230) > register_command perfcounters_dump hook 0x28d4010 > -126> 2015-01-10 19:39:41.861749 7fb2b9faa780 5 asok(0x28e4230) > register_command 1 hook 0x28d4010 > -125> 2015-01-10 19:39:41.861753 7fb2b9faa780 5 asok(0x28e4230) > register_command perf dump hook 0x28d4010 > -124> 2015-01-10 19:39:41.861756 7fb2b9faa780 5 asok(0x28e4230) > register_command perfcounters_schema hook 0x28d4010 > -123> 2015-01-10 19:39:41.861759 7fb2b9faa780 5 asok(0x28e4230) > register_command 2 hook 0x28d4010 > -122> 2015-01-10 19:39:41.861762 7fb2b9faa780 5 asok(0x28e4230) > register_command perf schema hook 0x28d4010 > -121> 2015-01-10 19:39:41.861764 7fb2b9faa780 5 asok(0x28e4230) > register_command config show hook 0x28d4010 > -120> 2015-01-10 19:39:41.861768 7fb2b9faa780 5 asok(0x28e4230) > register_command config set hook 0x28d4010 > -119> 2015-01-10 19:39:41.861773 7fb2b9faa780 5 asok(0x28e4230) > register_command config get hook 0x28d4010 > -118> 2015-01-10 19:39:41.861779 7fb2b9faa780 5 asok(0x28e4230) > register_command log flush hook 0x28d4010 > -117> 2015-01-10 19:39:41.861784 7fb2b9faa780 5 asok(0x28e4230) > register_command log dump hook 0x28d4010 > -116> 2015-01-10 19:39:41.861789 7fb2b9faa780 5 asok(0x28e4230) > register_command log reopen hook 0x28d4010 > -115> 2015-01-10 19:39:41.864385 7fb2b9faa780 0 ceph version 0.80.7 > (6c0127fcb58008793d3c8b62d925bc91963672a3), process ceph-osd, pid 918 > 5 > -114> 2015-01-10 19:39:41.873624 7fb2b9faa780 1 finished > global_init_daemonize > -113> 2015-01-10 19:39:41.892039 7fb2b9faa780 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-62) detect_features: > FIEMAP ioctl is suppo > rted and appears to work > -112> 2015-01-10 19:39:41.892081 7fb2b9faa780 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-62) detect_features: > FIEMAP ioctl is disab > led via 'filestore fiemap' config option > -111> 2015-01-10 19:39:41.902334 7fb2b9faa780 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-62) detect_features: > syscall(SYS_syncfs, f > d) fully supported > -110> 2015-01-10 19:39:41.983875 7fb2b9faa780 0 > filestore(/var/lib/ceph/osd/ceph-62) limited size xattrs > -109> 2015-01-10 19:39:42.112708 7fb2b9faa780 0 > filestore(/var/lib/ceph/osd/ceph-62) mount: enabling WRITEAHEAD journal > mode: checkpoint > > Udo > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Loïc Dachary, Artisan Logiciel Libre
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com