Thank you, Igor. I will try to see how to collect the perf values. Not sure about restarting all OSDs as it's a production cluster, is there a less invasive way? /Z On Tue, 9 May 2023 at 23:58, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote: > Hi Zakhar, > > Let's leave questions regarding cache usage/tuning to a different topic > for now. And concentrate on performance drop. > > Could you please do the same experiment I asked from Nikola once your > cluster reaches "bad performance" state (Nikola, could you please use this > improved scenario as well?): > > - collect perf counters for every OSD > > - reset perf counters for every OSD > > - leave the cluster running for 10 mins and collect perf counters again. > > - Then restart OSDs one-by-one starting with the worst OSD (in terms of > subop_w_lat from the prev step). Wouldn't be sufficient to reset just a few > OSDs before the cluster is back to normal? > > - if partial OSD restart is sufficient - please leave the remaining OSDs > run as-is without reboot. > > - after the restart (no matter partial or complete one - the key thing > it's should successful) reset all the perf counters and leave the cluster > run for 30 mins and collect perf counters again. > > - wait 24 hours and collect the counters one more time > > - share all four counters snapshots. > > > Thanks, > > Igor > > On 5/8/2023 11:31 PM, Zakhar Kirpichenko wrote: > > Don't mean to hijack the thread, but I may be observing something similar > with 16.2.12: OSD performance noticeably peaks after OSD restart and then > gradually reduces over 10-14 days, while commit and apply latencies > increase across the board. > > Non-default settings are: > > "bluestore_cache_size_hdd": { > "default": "1073741824", > "mon": "4294967296", > "final": "4294967296" > }, > "bluestore_cache_size_ssd": { > "default": "3221225472", > "mon": "4294967296", > "final": "4294967296" > }, > ... > "osd_memory_cache_min": { > "default": "134217728", > "mon": "2147483648", > "final": "2147483648" > }, > "osd_memory_target": { > "default": "4294967296", > "mon": "17179869184", > "final": "17179869184" > }, > "osd_scrub_sleep": { > "default": 0, > "mon": 0.10000000000000001, > "final": 0.10000000000000001 > }, > "rbd_balance_parent_reads": { > "default": false, > "mon": true, > "final": true > }, > > All other settings are default, the usage is rather simple Openstack / > RBD. > > I also noticed that OSD cache usage doesn't increase over time (see my > message "Ceph 16.2.12, bluestore cache doesn't seem to be used much" dated > 26 April 2023, which received no comments), despite OSDs are being used > rather heavily and there's plenty of host and OSD cache / target memory > available. It may be worth checking if available memory is being used in a > good way. > > /Z > > On Mon, 8 May 2023 at 22:35, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote: > >> Hey Nikola, >> >> On 5/8/2023 10:13 PM, Nikola Ciprich wrote: >> > OK, starting collecting those for all OSDs.. >> > I have hour samples of all OSDs perf dumps loaded in DB, so I can >> easily examine, >> > sort, whatever.. >> > >> You didn't reset the counters every hour, do you? So having average >> subop_w_latency growing that way means the current values were much >> higher than before. >> >> Curious if subop latencies were growing for every OSD or just a subset >> (may be even just a single one) of them? >> >> >> Next time you reach the bad state please do the following if possible: >> >> - reset perf counters for every OSD >> >> - leave the cluster running for 10 mins and collect perf counters again. >> >> - Then start restarting OSD one-by-one starting with the worst OSD (in >> terms of subop_w_lat from the prev step). Wouldn't be sufficient to >> reset just a few OSDs before the cluster is back to normal? >> >> >> currently values for avgtime are around 0.0003 for subop_w_lat and >> 0.001-0.002 >> >> for op_w_lat >> > OK, so there is no visible trend on op_w_lat, still between 0.001 and >> 0.002 >> > >> > subop_w_lat seems to have increased since yesterday though! I see >> values from >> > 0.0004 to as high as 0.001 >> > >> > If some other perf data might be interesting, please let me know.. >> > >> > During OSD restarts, I noticed strange thing - restarts on first 6 >> machines >> > went smooth, but then on another 3, I saw rocksdb logs recovery on all >> SSD >> > OSDs. but first didn't see any mention of daemon crash in ceph -s >> > >> > later, crash info appeared, but only about 3 daemons (in total, at >> least 20 >> > of them crashed though) >> > >> > crash report was similar for all three OSDs: >> > >> > [root@nrbphav4a ~]# ceph crash info >> 2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3 >> > { >> > "backtrace": [ >> > "/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]", >> > "(BlueStore::_txc_create(BlueStore::Collection*, >> BlueStore::OpSequencer*, std::__cxx11::list<Context*, >> std::allocator<Context*> >*, boost::intrusive_ptr<TrackedOp>)+0x413) >> [0x55a1c9d07c43]", >> > >> "(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, >> std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x22b) >> [0x55a1c9d27e9b]", >> > "(ReplicatedBackend::submit_transaction(hobject_t const&, >> object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, >> std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t >> const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, >> std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, >> boost::intrusive_ptr<OpRequest>)+0x8ad) [0x55a1c9bbcfdd]", >> > "(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, >> PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]", >> > >> "(PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext, >> std::default_delete<PrimaryLogPG::OpContext> >)+0x57) [0x55a1c99d6777]", >> > >> "(PrimaryLogPG::handle_watch_timeout(std::shared_ptr<Watch>)+0xb73) >> [0x55a1c99da883]", >> > "/usr/bin/ceph-osd(+0x58794e) [0x55a1c992994e]", >> > "(CommonSafeTimer<std::mutex>::timer_thread()+0x11a) >> [0x55a1c9e226aa]", >> > "/usr/bin/ceph-osd(+0xa80eb1) [0x55a1c9e22eb1]", >> > "/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]", >> > "/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]" >> > ], >> > "ceph_version": "17.2.6", >> > "crash_id": >> "2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3", >> > "entity_name": "osd.98", >> > "os_id": "almalinux", >> > "os_name": "AlmaLinux", >> > "os_version": "9.0 (Emerald Puma)", >> > "os_version_id": "9.0", >> > "process_name": "ceph-osd", >> > "stack_sig": >> "b1a1c5bd45e23382497312202e16cfd7a62df018c6ebf9ded0f3b3ca3c1dfa66", >> > "timestamp": "2023-05-08T17:45:47.056675Z", >> > "utsname_hostname": "nrbphav4h", >> > "utsname_machine": "x86_64", >> > "utsname_release": "5.15.90lb9.01", >> > "utsname_sysname": "Linux", >> > "utsname_version": "#1 SMP Fri Jan 27 15:52:13 CET 2023" >> > } >> > >> > >> > I was trying to figure out why this particular 3 nodes could behave >> differently >> > and found out from colleagues, that those 3 nodes were added to cluster >> lately >> > with direct install of 17.2.5 (others were installed 15.2.16 and later >> upgraded) >> > >> > not sure whether this is related to our problem though.. >> > >> > I see very similar crash reported here: >> https://tracker.ceph.com/issues/56346 >> > so I'm not reporting.. >> > >> > Do you think this might somehow be the cause of the problem? Anything >> else I should >> > check in perf dumps or elsewhere? >> >> Hmm... don't know yet. Could you please last 20K lines prior the crash >> from e.g two sample OSDs? >> >> And the crash isn't permanent, OSDs are able to start after the >> second(?) shot, aren't they? >> >> > with best regards >> > >> > nik >> > >> > >> > >> > >> > >> > >> -- >> Igor Fedotov >> Ceph Lead Developer >> -- >> croit GmbH, Freseniusstr. 31h, 81247 Munich >> CEO: Martin Verges - VAT-ID: DE310638492 >> Com. register: Amtsgericht Munich HRB 231263 >> Web <https://croit.io/> | LinkedIn <http://linkedin.com/company/croit> | >> Youtube <https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> | >> Twitter <https://twitter.com/croit_io> >> >> Meet us at the SC22 Conference! Learn more <https://croit.io/croit-sc22> >> Technology Fast50 Award Winner by Deloitte >> < >> https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html >> >! >> >> < >> https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html >> > >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > -- > Igor Fedotov > Ceph Lead Developer > -- > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web <https://croit.io/> | LinkedIn <http://linkedin.com/company/croit> | > Youtube <https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> | > Twitter <https://twitter.com/croit_io> > > Meet us at the SC22 Conference! Learn more <https://croit.io/croit-sc22> > Technology Fast50 Award Winner by Deloitte > <https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html> > ! > > > <https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx