Re: Slow ops during index pool recovery causes cluster performance drop to 1%

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Mon, 4 Nov 2024 10:14:15 +0100 (CET)

Hi Istvan,

Is you upgraded cluster using wpq or mclock scheduler? (ceph tell osd.X config show | grep osd_op_queue)

Maybe your OSDs set their osd_mclock_max_capacity_iops_* capacity too low on start (ceph config dump | grep osd_mclock_max_capacity_iops) limiting their performance.

You might want to raise these figures if set or go back to wpq to give you enough time to understand how mclock works.

Also, check bluefs_buffered_io as it's default value changed over time. Better run 'true' now (ceph tell osd.X config show | grep bluefs_buffered_io)
Also, check for any overspilling as there's been a bug in the past with overspilling not being reported on ceph status (ceph tell osd.X bluefs stats, SLOW line should show 0 Bytes and 0 FILES).

Regards,
Frédéric.

----- Le 4 Nov 24, à 5:24, Istvan Szabo, Agoda Istvan.Szabo@xxxxxxxxx a écrit :

> Hi Tyler,
> 
> To be honest we don't have anything set by ourselves regarding compaction and
> rocksdb:
> When I check the socket with ceph daemon on nvme and on ssd both have default
> false on compactL
> "mon_compact_on_start": "false"
> "osd_compact_on_start": "false",
> 
> Rocksdb also default:
> bluestore_rocksdb_options":
> "compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2,max_total_wal_size=1073741824"
> 
> This is 1 event during the slow ops out of the 20:
> https://gist.githubusercontent.com/Badb0yBadb0y/30de736f5d2bd6ec48aa7acf0a3caa14/raw/1070acbf82cc8d69efc04e4e0583e7f83bd33b3f/gistfile1.txt
> 
> All belongs to a bucket which doing streaming operation which means continuous
> delete and upload 24/7.
> 
> I can see throttled options but still don't understand why the high latency.
> 
> 
> ty
> 
> ________________________________
> From: Tyler Stachecki <stachecki.tyler@xxxxxxxxx>
> Sent: Sunday, November 3, 2024 4:07 PM
> To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
> Cc: Ceph Users <ceph-users@xxxxxxx>
> Subject: Re:  Re: Slow ops during index pool recovery causes cluster
> performance drop to 1%
> 
> Email received from the internet. If in doubt, don't click any link nor open any
> attachment !
> ________________________________
> 
> On Sun, Nov 3, 2024 at 1:28 AM Szabo, Istvan (Agoda)
> <Istvan.Szabo@xxxxxxxxx> wrote:
>> Hi,
>>
>> I'm updating from octopus to quincy and all in our cluster when index pool
>> recovery kicks off, cluster operation drops to 1%, slow ops comes non-stop.
>> The recovery takes 1-2 hours/nodes.
>>
>> What I can see the iowait on the nvme drives which belongs to the index pool is
>> pretty high, however the throughput is less than 500MB/s, the iops is less than
>> 5000/sec.
> ...
>> after update and machine reboot compaction kicks off which generates 30-40
>> iowait on the node, we prevent with "noup" flag to put these osds into the
>> cluster until compaction finished, however when we have 0 iowait after
>> compaction, I unset noup so recovery can start which causes the above issue. If
>> I wouldn't set noup it would cause even bigger issue.
> 
> By any chance, are you specifying a value for
> bluestore_rocksdb_options in your ceph.conf? The compaction
> observation at reboot in particular is odd.
> 
> Tyler
> 
> ________________________________
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright or
> other legal rules. If you have received it by mistake please let us know by
> reply email and delete it from your system. It is prohibited to copy this
> message or disclose its content to anyone. Any confidentiality or privilege is
> not waived or lost by any mistaken delivery or unauthorized disclosure of the
> message. All messages sent to and from Agoda may be monitored to ensure
> compliance with company policies, to protect the company's interests and to
> remove potential malware. Electronic messages may be intercepted, amended, lost
> or deleted, or contain viruses.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx