Re: Slow ops during index pool recovery causes cluster performance drop to 1%

Tyler Stachecki <stachecki.tyler@xxxxxxxxx> · Sun, 3 Nov 2024 10:07:36 -0500

On Sun, Nov 3, 2024 at 1:28 AM Szabo, Istvan (Agoda)
<Istvan.Szabo@xxxxxxxxx> wrote:
> Hi,
>
> I'm updating from octopus to quincy and all in our cluster when index pool recovery kicks off, cluster operation drops to 1%, slow ops comes non-stop.
> The recovery takes 1-2 hours/nodes.
>
> What I can see the iowait on the nvme drives which belongs to the index pool is pretty high, however the throughput is less than 500MB/s, the iops is less than 5000/sec.
...
> after update and machine reboot compaction kicks off which generates 30-40 iowait on the node, we prevent with "noup" flag to put these osds into the cluster until compaction finished, however when we have 0 iowait after compaction, I unset noup so recovery can start which causes the above issue. If I wouldn't set noup it would cause even bigger issue.

By any chance, are you specifying a value for
bluestore_rocksdb_options in your ceph.conf? The compaction
observation at reboot in particular is odd.

Tyler
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx