Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> · Tue, 17 May 2022 14:21:00 -0400

What was the largest cluster that you upgraded that didn't exhibit the new
issue in 16.2.8 ? Thanks.

Respectfully,

*Wes Dillingham*
wes@xxxxxxxxxxxxxxxxx
LinkedIn <http://www.linkedin.com/in/wesleydillingham>

On Tue, May 17, 2022 at 10:24 AM David Orman <ormandj@xxxxxxxxxxxx> wrote:

> We had an issue with our original fix in 45963 which was resolved in
> https://github.com/ceph/ceph/pull/46096. It includes the fix as well as
> handling for upgraded clusters. This is in the 16.2.8 release. I'm not sure
> if it will resolve your problem (or help mitigate it) but it would be worth
> trying.
>
> Head's up on 16.2.8 though, see the release thread, we ran into an issue
> with it on our larger clusters: https://tracker.ceph.com/issues/55687
>
> On Tue, May 17, 2022 at 3:44 AM BEAUDICHON Hubert (Acoss) <
> hubert.beaudichon@xxxxxxxx> wrote:
>
> > Hi Josh,
> >
> > I'm working with Stéphane and I'm the "ceph admin" (big words ^^) in our
> > team.
> > So, yes, as part of the upgrade we've done the offline repair to split
> the
> > omap by pool.
> > The quick fix is, as far as I know, still disable on the default
> > properties.
> >
> > On the I/O and CPU load, between Nautilus and Pacific, we haven't seen a
> > really big change, just an increase in disk latency and in the end, the
> > "ceph read operation" metric drop from 20K to 5K or less.
> >
> > But yes, a lot of slow IOPs were emerging as time passed.
> >
> > At this time, we have completely out one of our data node, and recreate
> > from scratch 5 of 8 OSD deamons (DB on SSD, data on spinning drive).
> > The result seems very good at this moment (we're seeing better metrics
> > than under Nautilus).
> >
> > Since recreation, I have change 3 parameters :
> > bdev_async_discard => osd : true
> > bdev_enable_discard => osd : true
> > bdev_aio_max_queue_depth => osd: 8192
> >
> > The first two have been extremely helpful for our SSD Pool, even with
> > enterprise grade SSD, the "trim" seems to have rejuvenate our pool.
> > The last one was set in response of messages in the newly create OSD :
> > "bdev(0x55588e220400 <path to block>) aio_submit retries XX"
> > After changing it and restarting the OSD process, messages were gone, and
> > it seems to have a beneficial effect on our data node.
> >
> > I've seen that the 16.2.8 was out yesterday, but I'm a little confused
> on :
> > [Revert] bluestore: set upper and lower bounds on rocksdb omap iterators
> > (pr#46092, Neha Ojha)
> > bluestore: set upper and lower bounds on rocksdb omap iterators
> (pr#45963,
> > Cory Snyder)
> >
> > (theses two lines seems related to https://tracker.ceph.com/issues/55324
> ).
> >
> > One step forward, one step backward ?
> >
> > Hubert Beaudichon
> >
> >
> > -----Message d'origine-----
> > De : Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
> > Envoyé : lundi 16 mai 2022 16:56
> > À : stéphane chalansonnet <schalans@xxxxxxxxx>
> > Cc : ceph-users@xxxxxxx
> > Objet :  Re: Migration Nautilus to Pacifi : Very high
> > latencies (EC profile)
> >
> > Hi Stéphane,
> >
> > On Sat, May 14, 2022 at 4:27 AM stéphane chalansonnet <
> schalans@xxxxxxxxx>
> > wrote:
> > > After a successful update from Nautilus to Pacific on Centos8.5, we
> > > observed some high latencies on our cluster.
> >
> > As a part of this upgrade, did you also migrate the OSDs to sharded
> > rocksdb column families? This would have been done by setting bluestore's
> > "quick fix on mount" setting to true or by issuing a "ceph-bluestore-tool
> > repair" offline, perhaps in response to a BLUESTORE_NO_PER_POOL_OMAP
> > warning post-upgrade.
> >
> > I ask because I'm wondering if you're hitting
> > https://tracker.ceph.com/issues/55324, for which there is a fix coming
> in
> > 16.2.8. If you inspect the nodes and disks involved in your EC pool, are
> > you seeing high read or write I/O? High CPU usage?
> >
> > Josh
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx