Re: block.db/block.wal device performance dropped after upgrade to 14.2.10

Manuel Lausch <manuel.lausch@xxxxxxxx> · Wed, 5 Aug 2020 13:30:45 +0200

Hello Vladimir,

I just tested this with a single node testcluster with 60 HDDs (3 of
them with bluestore without separate wal and db).

With the 14.2.10, I see on the bluestore OSDs a lot of read IOPs while
snaptrimming. With 14.2.9 this was not an issue. 

I wonder if this would explain the huge amount of slowops on my big
testcluster (44 Nodes 1056 OSDs) while snaptrimming. I
cannot test a downgrade there, because there are no packages of older
releases for CentOS 8 available.

Regards
Manuel

On Tue, 4 Aug 2020 13:22:34 +0300
Vladimir Prokofev <v@xxxxxxxxxxx> wrote:

> Here's some more insight into the issue.
> Looks like the load is triggered because of a snaptrim operation. We
> have a backup pool that serves as Openstack cinder-backup storage,
> performing snapshot backups every night. Old backups are also deleted
> every night, so snaptrim is initiated.
> This snaptrim increased load on the block.db devices after upgrade,
> and just kills one SSD's performance in particular. It serves as a
> block.db/wal device for one of the fatter backup pool OSDs which has
> more PGs placed there.
> This is a Kingston SSD, and we see this issue on other Kingston SSD
> journals too, Intel SSD journals are not that affected, though they
> too experience increased load.
> Nevertheless, there're now a lot of read IOPS on block.db devices
> after upgrade that were not there before.
> I wonder how 600 IOPS can destroy SSDs performance that hard.
> 
> вт, 4 авг. 2020 г. в 12:54, Vladimir Prokofev <v@xxxxxxxxxxx>:
> 
> > Good day, cephers!
> >
> > We've recently upgraded our cluster from 14.2.8 to 14.2.10 release,
> > also performing full system packages upgrade(Ubuntu 18.04 LTS).
> > After that performance significantly dropped, main reason beeing
> > that journal SSDs are now have no merges, huge queues, and
> > increased latency. There's a few screenshots in attachments. This
> > is for an SSD journal that supports block.db/block.wal for 3
> > spinning OSDs, and it looks like this for all our SSD block.db/wal
> > devices across all nodes. Any ideas what may cause that? Maybe I've
> > missed something important in release notes?
> >  
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx