Thank you for your reply.
I'm not sure if this is the case, since we have a rather small cluster and the PGs have at most just over 10k objects (total objects in the cluster is about 9 million). During the 10 minute scrubs we're seeing a steady 10k iops on the underlying block device of the OSD (which are enterprise SSDs), both on the primary OSD and on a secondy OSD. It's all read IOPS and throughput is about 65 MiB/s. I'm not very familiar with the deep scrub process, but this seems a bit much to me. Can this still be intended behaviour? This would mean it's only able to check like 15-20 objects a second with SSD OSDs while doing 8k IOPS. The strange thing is that we didn't see this happen at all before the upgrade, it started right after.
I also checked the PGs of which the deep-scrub finished in a couple of second, most of those have 5k objects. The PGs for which deep-scrub is giving issues, seem to all be part of the rgw bucket index pool.
Since we're using tenants for RGW, the dynamic bucket index resharding didn't work before the update of Ceph ( http://tracker.ceph.com/issues/22046). After the update it was hammering the cluster quite hard, doing about 30-60k write iops on the rgw index pool for two days straight. The resharding list also kept showing pretty much completely different data every few seconds. Since this was also affecting performance, we temporarily disabled this. Could this somehow be related?
Thanks
Sander
From: Gregory Farnum <gfarnum@xxxxxxxxxx>
Sent: Thursday, June 14, 2018 19:45 To: Sander van Schie / True Cc: ceph-users@xxxxxxxxxxxxxx Subject: Re: Performance issues with deep-scrub since upgrading from v12.2.2 to v12.2.5 Deep scrub needs to read every object in the pg. if some pgs are only taking 5 seconds they must be nearly empty (or maybe they only contain objects with small amounts of omap or something). Ten minutes is perfectly reasonable, but it is an added load
on the cluster as it does all those object reads. Perhaps your configured scrub rates are using enough iops that you don’t have enough for your client workloads.
-Greg On Thu, Jun 14, 2018 at 11:37 AM Sander van Schie / True <Sander.vanSchie@xxxxxxx> wrote:
Hello, |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com