Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

Zakhar Kirpichenko <zakhar@xxxxxxxxx> · Mon, 8 May 2023 23:31:49 +0300

Don't mean to hijack the thread, but I may be observing something similar
with 16.2.12: OSD performance noticeably peaks after OSD restart and then
gradually reduces over 10-14 days, while commit and apply latencies
increase across the board.

Non-default settings are:

        "bluestore_cache_size_hdd": {
            "default": "1073741824",
            "mon": "4294967296",
            "final": "4294967296"
        },
        "bluestore_cache_size_ssd": {
            "default": "3221225472",
            "mon": "4294967296",
            "final": "4294967296"
        },
...
        "osd_memory_cache_min": {
            "default": "134217728",
            "mon": "2147483648",
            "final": "2147483648"
        },
        "osd_memory_target": {
            "default": "4294967296",
            "mon": "17179869184",
            "final": "17179869184"
        },
        "osd_scrub_sleep": {
            "default": 0,
            "mon": 0.10000000000000001,
            "final": 0.10000000000000001
        },
        "rbd_balance_parent_reads": {
            "default": false,
            "mon": true,
            "final": true
        },

All other settings are default, the usage is rather simple Openstack / RBD.

I also noticed that OSD cache usage doesn't increase over time (see my
message "Ceph 16.2.12, bluestore cache doesn't seem to be used much" dated
26 April 2023, which received no comments), despite OSDs are being used
rather heavily and there's plenty of host and OSD cache / target memory
available. It may be worth checking if available memory is being used in a
good way.

/Z

On Mon, 8 May 2023 at 22:35, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:

> Hey Nikola,
>
> On 5/8/2023 10:13 PM, Nikola Ciprich wrote:
> > OK, starting collecting those for all OSDs..
> > I have hour samples of all OSDs perf dumps loaded in DB, so I can easily
> examine,
> > sort, whatever..
> >
> You didn't reset the counters every hour, do you? So having average
> subop_w_latency growing that way means the current values were much
> higher than before.
>
> Curious if subop latencies were growing for every OSD or just a subset
> (may be even just a single one) of them?
>
>
> Next time you reach the bad state please do the following if possible:
>
> - reset perf counters for every OSD
>
> -  leave the cluster running for 10 mins and collect perf counters again.
>
> - Then start restarting OSD one-by-one starting with the worst OSD (in
> terms of subop_w_lat from the prev step). Wouldn't be sufficient to
> reset just a few OSDs before the cluster is back to normal?
>
> >> currently values for avgtime are around 0.0003 for subop_w_lat and
> 0.001-0.002
> >> for op_w_lat
> > OK, so there is no visible trend on op_w_lat, still between 0.001 and
> 0.002
> >
> > subop_w_lat seems to have increased since yesterday though! I see values
> from
> > 0.0004 to as high as 0.001
> >
> > If some other perf data might be interesting, please let me know..
> >
> > During OSD restarts, I noticed strange thing - restarts on first 6
> machines
> > went smooth, but then on another 3, I saw rocksdb logs recovery on all
> SSD
> > OSDs. but first didn't see any mention of daemon crash in ceph -s
> >
> > later, crash info appeared, but only about 3 daemons (in total, at least
> 20
> > of them crashed though)
> >
> > crash report was similar for all three OSDs:
> >
> > [root@nrbphav4a ~]# ceph crash info
> 2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
> > {
> >      "backtrace": [
> >          "/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
> >          "(BlueStore::_txc_create(BlueStore::Collection*,
> BlueStore::OpSequencer*, std::__cxx11::list<Context*,
> std::allocator<Context*> >*, boost::intrusive_ptr<TrackedOp>)+0x413)
> [0x55a1c9d07c43]",
> >
> "(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction>
> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x22b)
> [0x55a1c9d27e9b]",
> >          "(ReplicatedBackend::submit_transaction(hobject_t const&,
> object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction,
> std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t
> const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&,
> std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t,
> boost::intrusive_ptr<OpRequest>)+0x8ad) [0x55a1c9bbcfdd]",
> >          "(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
> PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]",
> >
> "(PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext,
> std::default_delete<PrimaryLogPG::OpContext> >)+0x57) [0x55a1c99d6777]",
> >
> "(PrimaryLogPG::handle_watch_timeout(std::shared_ptr<Watch>)+0xb73)
> [0x55a1c99da883]",
> >          "/usr/bin/ceph-osd(+0x58794e) [0x55a1c992994e]",
> >          "(CommonSafeTimer<std::mutex>::timer_thread()+0x11a)
> [0x55a1c9e226aa]",
> >          "/usr/bin/ceph-osd(+0xa80eb1) [0x55a1c9e22eb1]",
> >          "/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]",
> >          "/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]"
> >      ],
> >      "ceph_version": "17.2.6",
> >      "crash_id":
> "2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3",
> >      "entity_name": "osd.98",
> >      "os_id": "almalinux",
> >      "os_name": "AlmaLinux",
> >      "os_version": "9.0 (Emerald Puma)",
> >      "os_version_id": "9.0",
> >      "process_name": "ceph-osd",
> >      "stack_sig":
> "b1a1c5bd45e23382497312202e16cfd7a62df018c6ebf9ded0f3b3ca3c1dfa66",
> >      "timestamp": "2023-05-08T17:45:47.056675Z",
> >      "utsname_hostname": "nrbphav4h",
> >      "utsname_machine": "x86_64",
> >      "utsname_release": "5.15.90lb9.01",
> >      "utsname_sysname": "Linux",
> >      "utsname_version": "#1 SMP Fri Jan 27 15:52:13 CET 2023"
> > }
> >
> >
> > I was trying to figure out why this particular 3 nodes could behave
> differently
> > and found out from colleagues, that those 3 nodes were added to cluster
> lately
> > with direct install of 17.2.5 (others were installed 15.2.16 and later
> upgraded)
> >
> > not sure whether this is related to our problem though..
> >
> > I see very similar crash reported here:
> https://tracker.ceph.com/issues/56346
> > so I'm not reporting..
> >
> > Do you think this might somehow be the cause of the problem? Anything
> else I should
> > check in perf dumps or elsewhere?
>
> Hmm... don't know yet. Could you please last 20K lines prior the crash
> from e.g two sample OSDs?
>
> And the crash isn't permanent, OSDs are able to start after the
> second(?) shot, aren't they?
>
> > with best regards
> >
> > nik
> >
> >
> >
> >
> >
> >
> --
> Igor Fedotov
> Ceph Lead Developer
> --
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web <https://croit.io/> | LinkedIn <http://linkedin.com/company/croit> |
> Youtube <https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> |
> Twitter <https://twitter.com/croit_io>
>
> Meet us at the SC22 Conference! Learn more <https://croit.io/croit-sc22>
> Technology Fast50 Award Winner by Deloitte
> <
> https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html
> >!
>
> <
> https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx