Hi all,
I seem to be hitting these tracker issues:
https://tracker.ceph.com/issues/23145
http://tracker.ceph.com/issues/24422
PG's 6.1 and 6.3f are having the issues
When i list all PG's of a down OSD with:
https://tracker.ceph.com/
http://tracker.ceph.com/
PG's 6.1 and 6.3f are having the issues
When i list all PG's of a down OSD with:
ceph-objectstore-tool --dry-run --type bluestore --data-path /var/lib/ceph/osd/ceph-17/ --op list-pgs
There are a lot of 'double' pgid's like (also for other pg's):
There are a lot of 'double' pgid's like (also for other pg's):
6.3fs3
6.3fs5
Is that normal? I would assume different shards for EC would be on seperate OSD's
We still have 4 OSD's down and 2 PG's down+remapped and i can't find any way to get the crashed OSD's back up.
pg 6.1 is down+remapped, acting [6,3,2147483647,29,2147483647,2147483647]
pg 6.3f is down+remapped, acting [20,24,2147483647,2147483647,3,28]
Kind regards,
Caspar Smit
Caspar Smit
2018-06-08 8:53 GMT+02:00 Caspar Smit <casparsmit@xxxxxxxxxxx>:
Update:
I've unset nodown to let it continue but now 4 osd's are down and cannot be brought up again, here's what the lofgfile reads:2018-06-08 08:35:01.716245 7f4c58de4700 0 log_channel(cluster) log [INF] : 6.e3s0 continuing backfill to osd.37(4) from (10864'911406,11124'921472] 6:c7d71bbd:::rbd_data.5.6c1d9574b0dc51. 0000000000bf38b9:head to 11124'921472 2018-06-08 08:35:01.727261 7f4c585e3700 -1 bluestore(/var/lib/ceph/osd/ceph-16) _txc_add_transaction error (2) No such file or directory not handled on operation 30 (op 0, counting from 0) 2018-06-08 08:35:01.727273 7f4c585e3700 -1 bluestore(/var/lib/ceph/osd/ceph-16) ENOENT on clone suggests osd bug 2018-06-08 08:35:01.730584 7f4c585e3700 -1 /home/builder/source/ceph-12.2.2/src/os/bluestore/ BlueStore.cc: In function 'void BlueStore::_txc_add_ transaction(BlueStore:: TransContext*, ObjectStore::Transaction*)' thread 7f4c585e3700 time 2018-06-08 08:35:01.727379 /home/builder/source/ceph-12.2.2/src/os/bluestore/ BlueStore.cc: 9363: FAILED assert(0 == "unexpected error") ceph version 12.2.2 (215dd7151453fae88e6f968c975b6c e309d42dcf) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x558e08ba4202]2: (BlueStore::_txc_add_transaction(BlueStore:: TransContext*, ObjectStore::Transaction*)+ 0x15fa) [0x558e08a55c3a] 3: (BlueStore::queue_transactions(ObjectStore:: Sequencer*, std::vector<ObjectStore:: Transaction, std::allocator<ObjectStore:: Transaction> >&, boost::intrusive_ptr< TrackedOp>, ThreadPool::TPHandle*)+0x546) [0x558e08a572a6] 4: (ObjectStore::queue_transaction(ObjectStore:: Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr< TrackedOp>, ThreadPool::TPHandle*)+0x14f) [0x558e085fa37f] 5: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x6c) [0x558e0857db5c] 6: (OSD::process_peering_events(std::__cxx11::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x442) [0x558e085abec2] 7: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x2c) [0x558e0861a91c] 8: (ThreadPool::worker(ThreadPool::WorkThread*)+ 0xeb8) [0x558e08bab3a8] 9: (ThreadPool::WorkThread::entry()+0x10) [0x558e08bac540] 10: (()+0x7494) [0x7f4c709ca494]11: (clone()+0x3f) [0x7f4c6fa51aff]NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.Any help is highly appreciated.Kind regards,Caspar Smit2018-06-08 7:57 GMT+02:00 Caspar Smit <casparsmit@xxxxxxxxxxx>:Well i let it run with flags nodown and it looked like it would finish BUT it all went wrong somewhere:This is now the state:
health: HEALTH_ERRnodown flag(s) set5602396/94833780 objects misplaced (5.908%)Reduced data availability: 143 pgs inactive, 142 pgs peering, 7 pgs staleDegraded data redundancy: 248859/94833780 objects degraded (0.262%), 194 pgs unclean, 21 pgs degraded, 12 pgs undersized11 stuck requests are blocked > 4096 secpgs: 13.965% pgs not active248859/94833780 objects degraded (0.262%)5602396/94833780 objects misplaced (5.908%)830 active+clean75 remapped+peering66 peering26 active+remapped+backfill_wait6 active+undersized+degraded+remapped+backfill_wait 6 active+recovery_wait+degraded+remapped 3 active+undersized+degraded+remapped+backfilling 3 stale+active+undersized+degraded+remapped+backfill_wait 3 stale+active+remapped+backfill_wait 2 active+recovery_wait+degraded2 active+remapped+backfilling1 activating+degraded+remapped1 stale+remapped+peering#ceph health detail shows:REQUEST_STUCK 11 stuck requests are blocked > 4096 sec11 ops are blocked > 16777.2 secosds 4,7,23,24 have stuck requests > 16777.2 secSo what happened and what should i do now?Thank you very much for any helpKind regards,
Caspar2018-06-07 13:33 GMT+02:00 Sage Weil <sage@xxxxxxxxxxxx>:Just a note: this is fixed in mimic. Previously, we would choose theOn Wed, 6 Jun 2018, Caspar Smit wrote:
> Hi all,
>
> We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a node
> to it.
>
> osd-max-backfills is at the default 1 so backfilling didn't go very fast
> but that doesn't matter.
>
> Once it started backfilling everything looked ok:
>
> ~300 pgs in backfill_wait
> ~10 pgs backfilling (~number of new osd's)
>
> But i noticed the degraded objects increasing a lot. I presume a pg that is
> in backfill_wait state doesn't accept any new writes anymore? Hence
> increasing the degraded objects?
>
> So far so good, but once a while i noticed a random OSD flapping (they come
> back up automatically). This isn't because the disk is saturated but a
> driver/controller/kernel incompatibility which 'hangs' the disk for a short
> time (scsi abort_task error in syslog). Investigating further i noticed
> this was already the case before the node expansion.
>
> These OSD's flapping results in lots of pg states which are a bit worrying:
>
> 109 active+remapped+backfill_wait
> 80 active+undersized+degraded+remapped+backfill_wait
> 51 active+recovery_wait+degraded+remapped
> 41 active+recovery_wait+degraded
> 27 active+recovery_wait+undersized+degraded+remapped
> 14 active+undersized+remapped+backfill_wait
> 4 active+undersized+degraded+remapped+backfilling
>
> I think the recovery_wait is more important then the backfill_wait, so i
> like to prioritize these because the recovery_wait was triggered by the
> flapping OSD's
highest-priority PG to start recovery on at the time, but once recovery
had started, the appearance of a new PG with a higher priority (e.g.,
because it finished peering after the others) wouldn't preempt/cancel the
other PG's recovery, so you would get behavior like the above.
Mimic implements that preemption, so you should not see behavior like
this. (If you do, then the function that assigns a priority score to a
PG needs to be tweaked.)
sage
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com