0 mlcod 0'0 active+clean] hit_set_trim 37:38000000:.ceph-internal::
2017-10-22 17:32:56.033936 7f3acaff5700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(
osd/ReplicatedPG.cc: 11782: FAILED assert(obc)
Hello,
I ran today a lot read IO with an simple rsync... and again, an OSD crashed :
But as before, I can't restart OSD. It continue crashing again. So OSD is out, cluster is recovering.
I had just time to increase OSD log.
# ceph tell osd.14 injectargs --debug-osd 5/5
Join log :
# grep -B100 -100 objdump /var/log/ceph/ceph-osd.14.log
If I ran another read, an other OSD willl probably crash.
Any Idee ?
I will probably plan to move data from erasure pool to replicat 3x pool. It's becoming unstable without any change.
Regards,
PS: Last sunday, I lost RBD header during remove of cache tier... a lot of thanks to http://fnordahl.com/2017/04/
17/ceph-rbd-volume-header- , to recreate it and resurrect RBD disk :)recovery/
Le 19/10/2017 à 00:19, Brad Hubbard a écrit :
On Wed, Oct 18, 2017 at 11:16 PM, pascal.pucci@xxxxxxxxxxxxxxx <pascal.pucci@xxxxxxxxxxxxxxx> wrote:hello, For 2 week, I lost sometime some OSD : Here trace : 0> 2017-10-18 05:16:40.873511 7f7c1e497700 -1 osd/ReplicatedPG.cc: In function '*void ReplicatedPG::hit_set_trim(*ReplicatedPG::OpContextUPtr&, unsigned int)' thread 7f7c1e497700 time 2017-10-18 05:16:40.869962 osd/ReplicatedPG.cc: 11782: FAILED assert(obc) Can you try to capture a log with debug_osd set to 10 or greater as per http://tracker.ceph.com/issues/19185 ? This will allow us to see the output from the PrimaryLogPG::get_object_context() function which may help identify the problem. Please also check your machines all have the same time zone set and their clocks are in sync. ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700 e4654fdbbe) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x55eec15a09e5] 2: (ReplicatedPG::hit_set_trim( std::unique_ptr<ReplicatedPG:: OpContext, std::default_delete< ReplicatedPG::OpContext> >&, unsigned int)+0x6dd) [0x55eec107a52d] 3: (ReplicatedPG::hit_set_ persist()+0xd7c) [0x55eec107d1bc] 4: (ReplicatedPG::do_op(std:: shared_ptr<OpRequest>&)+ 0x1a92) [0x55eec109bbe2] 5: (ReplicatedPG::do_request(std: :shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x747) [0x55eec10588a7] 6: (OSD::dequeue_op(boost:: intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x55eec0f0bbad] 7: (PGQueueable::RunVis:: operator()(std::shared_ptr< OpRequest>&)+0x6d) [0x55eec0f0bdfd] 8: (OSD::ShardedOpWQ::_process( unsigned int, ceph::heartbeat_handle_d*)+ 0x77b) [0x55eec0f0f7db] 9: (ShardedThreadPool:: shardedthreadpool_worker( unsigned int)+0x887) [0x55eec1590987] 10: (ShardedThreadPool:: WorkThreadSharded::entry()+ 0x10) [0x55eec15928f0] 11: (()+0x7e25) [0x7f7c4fd52e25] 12: (clone()+0x6d) [0x7f7c4e3dc34d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. I am using Jewel 10.2.10 I am using Erasure coding pool (2+1) + Nvme cache tier (backwrite) with 3 replica with simple RBD disk. (12 OSD Sata disk on 4 nodes + 1 nvme on each node = 48 x OSD sata + 8 x NVMe Osd (I split NVMe in 2). Last week, it was only nvme OSD which crashed. So I unmap all disk, detroyed cache and recreated It. >From this days, it work fine. Today, an OSD crahed. But it was not an NVME OSD this time, a normal OSD (sata). Any idee ? what about this void "*ReplicatedPG::hit_set_trim". *thanks for your help,* * Regards, ______________________________ _________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/ listinfo.cgi/ceph-users-ceph. com
--
Performance Conseil Informatique
Pascal Pucci
Consultant Infrastructure
pascal.pucci@xxxxxxxxxxxxxxx
Mobile : 06 51 47 84 98
Bureau : 02 85 52 41 81
http://www.performance-conseil-informatique.net News : Très heureux de réaliser des projets continuité stockage avec DataCore depuis 2008. PCI est partenaire Silver DataCore. Merci à DataCore ...lire...I
--
Brad
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com