Re: deep-scrub / backfilling: large amount of SLOW_OPS after upgrade to 13.2.8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I encountered persistent SLOW_OPS just a few days ago on a recently upgraded 13.2.8 cluster, which has an SSD pool and an HDD pool. All OSDs are Bluestore, we're not using separate journal / DB volumes. The HDD pool is more or less used for cold storage, so performance is not critical.

One OSD in particular (HDD) was reporting the SLOW_OPS. I suspected that the drive was on the way out, but SMART stats looked ok, and there were no IO errors reported in the kernel log. Restarting that OSD helped initially, but eventually the SLOW_OPS starting to pile up again.

We have a fair number of VMs running from RBDs, most of them on the SSD pool, but a few on HDD. Most of the VMs are configured with a weekly fstrim cronjob, and we have QEMU configured to pass the DISCARD commands down to Ceph. One VM however, which has a bunch of 50 GB files as part of a Bareos setup (fork of Bacula), has the filesystem mounted with discard option, so it will trim immediately when files are deleted. I tracked the SLOW_OPS to a time period during which that VM was recycling (i.e., deleting & trimming) some of these large 50 GB files. In other words, it seems that there might be a performance regression in deleting large numbers of rados objects at once.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux