Hello Vladimir, I just tested this with a single node testcluster with 60 HDDs (3 of them with bluestore without separate wal and db). With the 14.2.10, I see on the bluestore OSDs a lot of read IOPs while snaptrimming. With 14.2.9 this was not an issue. I wonder if this would explain the huge amount of slowops on my big testcluster (44 Nodes 1056 OSDs) while snaptrimming. I cannot test a downgrade there, because there are no packages of older releases for CentOS 8 available. Regards Manuel On Tue, 4 Aug 2020 13:22:34 +0300 Vladimir Prokofev <v@xxxxxxxxxxx> wrote: > Here's some more insight into the issue. > Looks like the load is triggered because of a snaptrim operation. We > have a backup pool that serves as Openstack cinder-backup storage, > performing snapshot backups every night. Old backups are also deleted > every night, so snaptrim is initiated. > This snaptrim increased load on the block.db devices after upgrade, > and just kills one SSD's performance in particular. It serves as a > block.db/wal device for one of the fatter backup pool OSDs which has > more PGs placed there. > This is a Kingston SSD, and we see this issue on other Kingston SSD > journals too, Intel SSD journals are not that affected, though they > too experience increased load. > Nevertheless, there're now a lot of read IOPS on block.db devices > after upgrade that were not there before. > I wonder how 600 IOPS can destroy SSDs performance that hard. > > вт, 4 авг. 2020 г. в 12:54, Vladimir Prokofev <v@xxxxxxxxxxx>: > > > Good day, cephers! > > > > We've recently upgraded our cluster from 14.2.8 to 14.2.10 release, > > also performing full system packages upgrade(Ubuntu 18.04 LTS). > > After that performance significantly dropped, main reason beeing > > that journal SSDs are now have no merges, huge queues, and > > increased latency. There's a few screenshots in attachments. This > > is for an SSD journal that supports block.db/block.wal for 3 > > spinning OSDs, and it looks like this for all our SSD block.db/wal > > devices across all nodes. Any ideas what may cause that? Maybe I've > > missed something important in release notes? > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx