Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you, Igor. I will try to see how to collect the perf values. Not sure
about restarting all OSDs as it's a production cluster, is there a less
invasive way?

/Z

On Tue, 9 May 2023 at 23:58, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:

> Hi Zakhar,
>
> Let's leave questions regarding cache usage/tuning to a different topic
> for now. And concentrate on performance drop.
>
> Could you please do the same experiment I asked from Nikola once your
> cluster reaches "bad performance" state (Nikola, could you please use this
> improved scenario as well?):
>
> - collect perf counters for every OSD
>
> - reset perf counters for every OSD
>
> -  leave the cluster running for 10 mins and collect perf counters again.
>
> - Then restart OSDs one-by-one starting with the worst OSD (in terms of
> subop_w_lat from the prev step). Wouldn't be sufficient to reset just a few
> OSDs before the cluster is back to normal?
>
> - if partial OSD restart is sufficient - please leave the remaining OSDs
> run as-is without reboot.
>
> - after the restart (no matter partial or complete one - the key thing
> it's should successful) reset all the perf counters and leave the cluster
> run for 30 mins and collect perf counters again.
>
> - wait 24 hours and collect the counters one more time
>
> - share all four counters snapshots.
>
>
> Thanks,
>
> Igor
>
> On 5/8/2023 11:31 PM, Zakhar Kirpichenko wrote:
>
> Don't mean to hijack the thread, but I may be observing something similar
> with 16.2.12: OSD performance noticeably peaks after OSD restart and then
> gradually reduces over 10-14 days, while commit and apply latencies
> increase across the board.
>
> Non-default settings are:
>
>         "bluestore_cache_size_hdd": {
>             "default": "1073741824",
>             "mon": "4294967296",
>             "final": "4294967296"
>         },
>         "bluestore_cache_size_ssd": {
>             "default": "3221225472",
>             "mon": "4294967296",
>             "final": "4294967296"
>         },
> ...
>         "osd_memory_cache_min": {
>             "default": "134217728",
>             "mon": "2147483648",
>             "final": "2147483648"
>         },
>         "osd_memory_target": {
>             "default": "4294967296",
>             "mon": "17179869184",
>             "final": "17179869184"
>         },
>         "osd_scrub_sleep": {
>             "default": 0,
>             "mon": 0.10000000000000001,
>             "final": 0.10000000000000001
>         },
>         "rbd_balance_parent_reads": {
>             "default": false,
>             "mon": true,
>             "final": true
>         },
>
> All other settings are default, the usage is rather simple Openstack /
> RBD.
>
> I also noticed that OSD cache usage doesn't increase over time (see my
> message "Ceph 16.2.12, bluestore cache doesn't seem to be used much" dated
> 26 April 2023, which received no comments), despite OSDs are being used
> rather heavily and there's plenty of host and OSD cache / target memory
> available. It may be worth checking if available memory is being used in a
> good way.
>
> /Z
>
> On Mon, 8 May 2023 at 22:35, Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:
>
>> Hey Nikola,
>>
>> On 5/8/2023 10:13 PM, Nikola Ciprich wrote:
>> > OK, starting collecting those for all OSDs..
>> > I have hour samples of all OSDs perf dumps loaded in DB, so I can
>> easily examine,
>> > sort, whatever..
>> >
>> You didn't reset the counters every hour, do you? So having average
>> subop_w_latency growing that way means the current values were much
>> higher than before.
>>
>> Curious if subop latencies were growing for every OSD or just a subset
>> (may be even just a single one) of them?
>>
>>
>> Next time you reach the bad state please do the following if possible:
>>
>> - reset perf counters for every OSD
>>
>> -  leave the cluster running for 10 mins and collect perf counters again.
>>
>> - Then start restarting OSD one-by-one starting with the worst OSD (in
>> terms of subop_w_lat from the prev step). Wouldn't be sufficient to
>> reset just a few OSDs before the cluster is back to normal?
>>
>> >> currently values for avgtime are around 0.0003 for subop_w_lat and
>> 0.001-0.002
>> >> for op_w_lat
>> > OK, so there is no visible trend on op_w_lat, still between 0.001 and
>> 0.002
>> >
>> > subop_w_lat seems to have increased since yesterday though! I see
>> values from
>> > 0.0004 to as high as 0.001
>> >
>> > If some other perf data might be interesting, please let me know..
>> >
>> > During OSD restarts, I noticed strange thing - restarts on first 6
>> machines
>> > went smooth, but then on another 3, I saw rocksdb logs recovery on all
>> SSD
>> > OSDs. but first didn't see any mention of daemon crash in ceph -s
>> >
>> > later, crash info appeared, but only about 3 daemons (in total, at
>> least 20
>> > of them crashed though)
>> >
>> > crash report was similar for all three OSDs:
>> >
>> > [root@nrbphav4a ~]# ceph crash info
>> 2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3
>> > {
>> >      "backtrace": [
>> >          "/lib64/libc.so.6(+0x54d90) [0x7f64a6323d90]",
>> >          "(BlueStore::_txc_create(BlueStore::Collection*,
>> BlueStore::OpSequencer*, std::__cxx11::list<Context*,
>> std::allocator<Context*> >*, boost::intrusive_ptr<TrackedOp>)+0x413)
>> [0x55a1c9d07c43]",
>> >
>> "(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>> std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction>
>> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x22b)
>> [0x55a1c9d27e9b]",
>> >          "(ReplicatedBackend::submit_transaction(hobject_t const&,
>> object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction,
>> std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t
>> const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&,
>> std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t,
>> boost::intrusive_ptr<OpRequest>)+0x8ad) [0x55a1c9bbcfdd]",
>> >          "(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
>> PrimaryLogPG::OpContext*)+0x38f) [0x55a1c99d1cbf]",
>> >
>> "(PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext,
>> std::default_delete<PrimaryLogPG::OpContext> >)+0x57) [0x55a1c99d6777]",
>> >
>> "(PrimaryLogPG::handle_watch_timeout(std::shared_ptr<Watch>)+0xb73)
>> [0x55a1c99da883]",
>> >          "/usr/bin/ceph-osd(+0x58794e) [0x55a1c992994e]",
>> >          "(CommonSafeTimer<std::mutex>::timer_thread()+0x11a)
>> [0x55a1c9e226aa]",
>> >          "/usr/bin/ceph-osd(+0xa80eb1) [0x55a1c9e22eb1]",
>> >          "/lib64/libc.so.6(+0x9f802) [0x7f64a636e802]",
>> >          "/lib64/libc.so.6(+0x3f450) [0x7f64a630e450]"
>> >      ],
>> >      "ceph_version": "17.2.6",
>> >      "crash_id":
>> "2023-05-08T17:45:47.056675Z_a5759fe9-60c6-423a-88fc-57663f692bd3",
>> >      "entity_name": "osd.98",
>> >      "os_id": "almalinux",
>> >      "os_name": "AlmaLinux",
>> >      "os_version": "9.0 (Emerald Puma)",
>> >      "os_version_id": "9.0",
>> >      "process_name": "ceph-osd",
>> >      "stack_sig":
>> "b1a1c5bd45e23382497312202e16cfd7a62df018c6ebf9ded0f3b3ca3c1dfa66",
>> >      "timestamp": "2023-05-08T17:45:47.056675Z",
>> >      "utsname_hostname": "nrbphav4h",
>> >      "utsname_machine": "x86_64",
>> >      "utsname_release": "5.15.90lb9.01",
>> >      "utsname_sysname": "Linux",
>> >      "utsname_version": "#1 SMP Fri Jan 27 15:52:13 CET 2023"
>> > }
>> >
>> >
>> > I was trying to figure out why this particular 3 nodes could behave
>> differently
>> > and found out from colleagues, that those 3 nodes were added to cluster
>> lately
>> > with direct install of 17.2.5 (others were installed 15.2.16 and later
>> upgraded)
>> >
>> > not sure whether this is related to our problem though..
>> >
>> > I see very similar crash reported here:
>> https://tracker.ceph.com/issues/56346
>> > so I'm not reporting..
>> >
>> > Do you think this might somehow be the cause of the problem? Anything
>> else I should
>> > check in perf dumps or elsewhere?
>>
>> Hmm... don't know yet. Could you please last 20K lines prior the crash
>> from e.g two sample OSDs?
>>
>> And the crash isn't permanent, OSDs are able to start after the
>> second(?) shot, aren't they?
>>
>> > with best regards
>> >
>> > nik
>> >
>> >
>> >
>> >
>> >
>> >
>> --
>> Igor Fedotov
>> Ceph Lead Developer
>> --
>> croit GmbH, Freseniusstr. 31h, 81247 Munich
>> CEO: Martin Verges - VAT-ID: DE310638492
>> Com. register: Amtsgericht Munich HRB 231263
>> Web <https://croit.io/> | LinkedIn <http://linkedin.com/company/croit> |
>> Youtube <https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> |
>> Twitter <https://twitter.com/croit_io>
>>
>> Meet us at the SC22 Conference! Learn more <https://croit.io/croit-sc22>
>> Technology Fast50 Award Winner by Deloitte
>> <
>> https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html
>> >!
>>
>> <
>> https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
> --
> Igor Fedotov
> Ceph Lead Developer
> --
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web <https://croit.io/> | LinkedIn <http://linkedin.com/company/croit> |
> Youtube <https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> |
> Twitter <https://twitter.com/croit_io>
>
> Meet us at the SC22 Conference! Learn more <https://croit.io/croit-sc22>
> Technology Fast50 Award Winner by Deloitte
> <https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html>
> !
>
>
> <https://www2.deloitte.com/de/de/pages/technology-media-and-telecommunications/articles/fast-50-2022-germany-winners.html>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux