Re: A LOT of Snapshots causing problems

David Turner <david.turner@xxxxxxxxxxxxxxxx> · Mon, 28 Nov 2016 20:42:22 +0000

An additional note, this was not sudden onset.  We tracked these 2 OSDs growing steadily for months until they got to the point where they were 10% higher than the rest of the cluster.

David Turner |
Cloud Operations Engineer |
StorageCraft
 Technology Corporation

380 Data Drive Suite 300 |
Draper |
Utah |
84020

Office:
801.871.2760 |
Mobile:
385.224.2943

If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this
 message is prohibited.

From: Ceph-large [ceph-large-bounces@xxxxxxxxxxxxxx] on behalf of David Turner [david.turner@xxxxxxxxxxxxxxxx]

Sent: Monday, November 28, 2016 1:36 PM

To: ceph-large@xxxxxxxxxxxxxx

Subject: [Ceph-large] A LOT of Snapshots causing problems

We've been tracking and investigating a few issues with PGs being different sizes leading to OSDs being different as well as our snap_trimq for pgs not emptying.

We noticed 2 things separately and then realized they were related.

1) We have a very good way to balance our cluster, but we had an OSD in 2 different clusters being 10% more full than anything else.

2) As monitoring, we query 300 random PGs every 5 minutes and calculate out what our total snap_trimq would be if those 300 PGs were typical of the 32k PGs in the cluster.  We are seeing that 2 of our clusters never get close to catching up on their snap_trimq.

We realized that these 2 clusters are the same clusters and that these problems might be related.  A `du -sh` of PGs show that the PGs primary to the OSDs that are 10% more full are 10GB (~30%) larger each than the PGs primary to other OSDs in the cluster and
 that the snap_trimq on those PGs is at a size that accounts for the extra 10GB.

We have been able to clean up one of the OSDs by setting it's snap_trim_sleep to 0.0 from our current setting of 0.25 as well as triggering a reweight to move some data off of the OSD.  We're currently testing only adjusting the snap_trim_sleep down to 0.0
 to fix this problem for future OSDs and it is looking promising.  Lowering it to 0.05 had no noticeable effect.  We are deleting ~ 5k snapshots every day in these clusters with 32k PGs and 1000+ OSDs.  We have one cluster with 32k PGs and 957 OSDs that isn't
 exhibiting this behavior to the same extent yet, although it is no longer getting down to an empty snap_trimq each day.

Does anyone have any theories or experiences with problems like this?  Thank you for your help.

David Turner |
Cloud Operations Engineer |
StorageCraft
 Technology Corporation

380 Data Drive Suite 300 |
Draper |
Utah |
84020

Office:
801.871.2760 |
Mobile:
385.224.2943

If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this
 message is prohibited.

_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com