Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Josh,

I'm working with Stéphane and I'm the "ceph admin" (big words ^^) in our team.
So, yes, as part of the upgrade we've done the offline repair to split the omap by pool.
The quick fix is, as far as I know, still disable on the default properties.

On the I/O and CPU load, between Nautilus and Pacific, we haven't seen a really big change, just an increase in disk latency and in the end, the "ceph read operation" metric drop from 20K to 5K or less.

But yes, a lot of slow IOPs were emerging as time passed.

At this time, we have completely out one of our data node, and recreate from scratch 5 of 8 OSD deamons (DB on SSD, data on spinning drive).
The result seems very good at this moment (we're seeing better metrics than under Nautilus).

Since recreation, I have change 3 parameters :
bdev_async_discard => osd : true
bdev_enable_discard => osd : true
bdev_aio_max_queue_depth => osd: 8192

The first two have been extremely helpful for our SSD Pool, even with enterprise grade SSD, the "trim" seems to have rejuvenate our pool.
The last one was set in response of messages in the newly create OSD :
"bdev(0x55588e220400 <path to block>) aio_submit retries XX"
After changing it and restarting the OSD process, messages were gone, and it seems to have a beneficial effect on our data node.

I've seen that the 16.2.8 was out yesterday, but I'm a little confused on :
[Revert] bluestore: set upper and lower bounds on rocksdb omap iterators (pr#46092, Neha Ojha)
bluestore: set upper and lower bounds on rocksdb omap iterators (pr#45963, Cory Snyder)

(theses two lines seems related to https://tracker.ceph.com/issues/55324).

One step forward, one step backward ?

Hubert Beaudichon


-----Message d'origine-----
De : Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> 
Envoyé : lundi 16 mai 2022 16:56
À : stéphane chalansonnet <schalans@xxxxxxxxx>
Cc : ceph-users@xxxxxxx
Objet :  Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

Hi Stéphane,

On Sat, May 14, 2022 at 4:27 AM stéphane chalansonnet <schalans@xxxxxxxxx> wrote:
> After a successful update from Nautilus to Pacific on Centos8.5, we 
> observed some high latencies on our cluster.

As a part of this upgrade, did you also migrate the OSDs to sharded rocksdb column families? This would have been done by setting bluestore's "quick fix on mount" setting to true or by issuing a "ceph-bluestore-tool repair" offline, perhaps in response to a BLUESTORE_NO_PER_POOL_OMAP warning post-upgrade.

I ask because I'm wondering if you're hitting https://tracker.ceph.com/issues/55324, for which there is a fix coming in 16.2.8. If you inspect the nodes and disks involved in your EC pool, are you seeing high read or write I/O? High CPU usage?

Josh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux