Hi Istvan, Is you upgraded cluster using wpq or mclock scheduler? (ceph tell osd.X config show | grep osd_op_queue) Maybe your OSDs set their osd_mclock_max_capacity_iops_* capacity too low on start (ceph config dump | grep osd_mclock_max_capacity_iops) limiting their performance. You might want to raise these figures if set or go back to wpq to give you enough time to understand how mclock works. Also, check bluefs_buffered_io as it's default value changed over time. Better run 'true' now (ceph tell osd.X config show | grep bluefs_buffered_io) Also, check for any overspilling as there's been a bug in the past with overspilling not being reported on ceph status (ceph tell osd.X bluefs stats, SLOW line should show 0 Bytes and 0 FILES). Regards, Frédéric. ----- Le 4 Nov 24, à 5:24, Istvan Szabo, Agoda Istvan.Szabo@xxxxxxxxx a écrit : > Hi Tyler, > > To be honest we don't have anything set by ourselves regarding compaction and > rocksdb: > When I check the socket with ceph daemon on nvme and on ssd both have default > false on compactL > "mon_compact_on_start": "false" > "osd_compact_on_start": "false", > > Rocksdb also default: > bluestore_rocksdb_options": > "compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2,max_total_wal_size=1073741824" > > This is 1 event during the slow ops out of the 20: > https://gist.githubusercontent.com/Badb0yBadb0y/30de736f5d2bd6ec48aa7acf0a3caa14/raw/1070acbf82cc8d69efc04e4e0583e7f83bd33b3f/gistfile1.txt > > All belongs to a bucket which doing streaming operation which means continuous > delete and upload 24/7. > > I can see throttled options but still don't understand why the high latency. > > > ty > > ________________________________ > From: Tyler Stachecki <stachecki.tyler@xxxxxxxxx> > Sent: Sunday, November 3, 2024 4:07 PM > To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx> > Cc: Ceph Users <ceph-users@xxxxxxx> > Subject: Re: Re: Slow ops during index pool recovery causes cluster > performance drop to 1% > > Email received from the internet. If in doubt, don't click any link nor open any > attachment ! > ________________________________ > > On Sun, Nov 3, 2024 at 1:28 AM Szabo, Istvan (Agoda) > <Istvan.Szabo@xxxxxxxxx> wrote: >> Hi, >> >> I'm updating from octopus to quincy and all in our cluster when index pool >> recovery kicks off, cluster operation drops to 1%, slow ops comes non-stop. >> The recovery takes 1-2 hours/nodes. >> >> What I can see the iowait on the nvme drives which belongs to the index pool is >> pretty high, however the throughput is less than 500MB/s, the iops is less than >> 5000/sec. > ... >> after update and machine reboot compaction kicks off which generates 30-40 >> iowait on the node, we prevent with "noup" flag to put these osds into the >> cluster until compaction finished, however when we have 0 iowait after >> compaction, I unset noup so recovery can start which causes the above issue. If >> I wouldn't set noup it would cause even bigger issue. > > By any chance, are you specifying a value for > bluestore_rocksdb_options in your ceph.conf? The compaction > observation at reboot in particular is odd. > > Tyler > > ________________________________ > This message is confidential and is for the sole use of the intended > recipient(s). It may also be privileged or otherwise protected by copyright or > other legal rules. If you have received it by mistake please let us know by > reply email and delete it from your system. It is prohibited to copy this > message or disclose its content to anyone. Any confidentiality or privilege is > not waived or lost by any mistaken delivery or unauthorized disclosure of the > message. All messages sent to and from Agoda may be monitored to ensure > compliance with company policies, to protect the company's interests and to > remove potential malware. Electronic messages may be intercepted, amended, lost > or deleted, or contain viruses. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx