We don't have any that wouldn't have the problem. That said, we've already got a PR out for the 16.2.8 issue we encountered, so I would expect a relatively quick update assuming no issues are found during testing. On Tue, May 17, 2022 at 1:21 PM Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> wrote: > What was the largest cluster that you upgraded that didn't exhibit the new > issue in 16.2.8 ? Thanks. > > Respectfully, > > *Wes Dillingham* > wes@xxxxxxxxxxxxxxxxx > LinkedIn <http://www.linkedin.com/in/wesleydillingham> > > > On Tue, May 17, 2022 at 10:24 AM David Orman <ormandj@xxxxxxxxxxxx> wrote: > >> We had an issue with our original fix in 45963 which was resolved in >> https://github.com/ceph/ceph/pull/46096. It includes the fix as well as >> handling for upgraded clusters. This is in the 16.2.8 release. I'm not >> sure >> if it will resolve your problem (or help mitigate it) but it would be >> worth >> trying. >> >> Head's up on 16.2.8 though, see the release thread, we ran into an issue >> with it on our larger clusters: https://tracker.ceph.com/issues/55687 >> >> On Tue, May 17, 2022 at 3:44 AM BEAUDICHON Hubert (Acoss) < >> hubert.beaudichon@xxxxxxxx> wrote: >> >> > Hi Josh, >> > >> > I'm working with Stéphane and I'm the "ceph admin" (big words ^^) in our >> > team. >> > So, yes, as part of the upgrade we've done the offline repair to split >> the >> > omap by pool. >> > The quick fix is, as far as I know, still disable on the default >> > properties. >> > >> > On the I/O and CPU load, between Nautilus and Pacific, we haven't seen a >> > really big change, just an increase in disk latency and in the end, the >> > "ceph read operation" metric drop from 20K to 5K or less. >> > >> > But yes, a lot of slow IOPs were emerging as time passed. >> > >> > At this time, we have completely out one of our data node, and recreate >> > from scratch 5 of 8 OSD deamons (DB on SSD, data on spinning drive). >> > The result seems very good at this moment (we're seeing better metrics >> > than under Nautilus). >> > >> > Since recreation, I have change 3 parameters : >> > bdev_async_discard => osd : true >> > bdev_enable_discard => osd : true >> > bdev_aio_max_queue_depth => osd: 8192 >> > >> > The first two have been extremely helpful for our SSD Pool, even with >> > enterprise grade SSD, the "trim" seems to have rejuvenate our pool. >> > The last one was set in response of messages in the newly create OSD : >> > "bdev(0x55588e220400 <path to block>) aio_submit retries XX" >> > After changing it and restarting the OSD process, messages were gone, >> and >> > it seems to have a beneficial effect on our data node. >> > >> > I've seen that the 16.2.8 was out yesterday, but I'm a little confused >> on : >> > [Revert] bluestore: set upper and lower bounds on rocksdb omap iterators >> > (pr#46092, Neha Ojha) >> > bluestore: set upper and lower bounds on rocksdb omap iterators >> (pr#45963, >> > Cory Snyder) >> > >> > (theses two lines seems related to >> https://tracker.ceph.com/issues/55324). >> > >> > One step forward, one step backward ? >> > >> > Hubert Beaudichon >> > >> > >> > -----Message d'origine----- >> > De : Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> >> > Envoyé : lundi 16 mai 2022 16:56 >> > À : stéphane chalansonnet <schalans@xxxxxxxxx> >> > Cc : ceph-users@xxxxxxx >> > Objet : Re: Migration Nautilus to Pacifi : Very high >> > latencies (EC profile) >> > >> > Hi Stéphane, >> > >> > On Sat, May 14, 2022 at 4:27 AM stéphane chalansonnet < >> schalans@xxxxxxxxx> >> > wrote: >> > > After a successful update from Nautilus to Pacific on Centos8.5, we >> > > observed some high latencies on our cluster. >> > >> > As a part of this upgrade, did you also migrate the OSDs to sharded >> > rocksdb column families? This would have been done by setting >> bluestore's >> > "quick fix on mount" setting to true or by issuing a >> "ceph-bluestore-tool >> > repair" offline, perhaps in response to a BLUESTORE_NO_PER_POOL_OMAP >> > warning post-upgrade. >> > >> > I ask because I'm wondering if you're hitting >> > https://tracker.ceph.com/issues/55324, for which there is a fix coming >> in >> > 16.2.8. If you inspect the nodes and disks involved in your EC pool, are >> > you seeing high read or write I/O? High CPU usage? >> > >> > Josh >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an >> > email to ceph-users-leave@xxxxxxx >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx