Re: Snaptrim making cluster unusable

Pascal Ehlert <pascal@xxxxxxxxxxxx> · Sun, 10 Jan 2021 19:19:09 +0100

I made the suggested changes.

(Un)fortunately I am not able to reproduce the issue anymore. Neither 
with the original settings nor the updated setting.
This may be due to the fact that the problematic snapshots have been 
removed/trimmed now.
When I make new snapshots of the same volumes, they are (obviously) 
trimmed in a few seconds without an impact on performance.

I will try to reproduce this again by artificially boosting the snapshot 
size.

For now, would you mind explaining if and why disabling the write cache 
is a good idea in general?
It feels that having too many layers of cache can be detrimental and I'd 
leave it disable then.

Thank you very much!

Pascal

Frank Schilder wrote on 10.01.21 18:56:
- do you have bluefs_buffered_io set to true
No
Try setting it to true.

Is there anything specific I can do to check the write cache configuration?
Yes, "smartctl -g wcache DEVICE" will tell you if writeback cache is disabled. If not, use "smartctl -s wcache=off DEVICE" to disable it. Note that this setting does not persist reboot. You will find a discussion about how to do that in the list.

With both changes, try to enable snaptrim again and report back.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Pascal Ehlert <pascal@xxxxxxxxxxxx>
Sent: 10 January 2021 18:47:40
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  Snaptrim making cluster unusable

Hi Frank,

Thanks for getting back!
- ceph version
15.2.6 (now upgraded to 15.2.8 and I was able to reproduce the issue)
- rbd image config (meta- and data pool the same/different?)
We are not using EC but regular replicated pools, so I assume meta and
data pool are the same?
- how many PGs do the affected pools have
512 for a total of 20.95TB of data
- how many PGs per OSD (as stated by ceph osd df tree)
Varying between ~80 to ~220 with the 4TB disks having roughly twice as
many as the 2TB disks
- what type of SSDs, do they have power loss protection, is write cache disabled
Mixed Intel SSDs, one example being Intel® SSD D3-S4510 Series
If this becomes relevant, I can look up some of the exact model, but I
couldn't pinpoint specific OSDs that struggled

The disks are connected through standard Intel SATA controllers,
sometimes onboard.
Is there anything specific I can do to check the write cache configuration?

- do you have bluefs_buffered_io set to true
No

Regards,

Pascal

For comparison, we are running daily rolling snapshots on ca. 260 RBD images with separate replicated meta-data and 6+2 EC data pool without any issues. No parameters changed from default. Version is mimic-13.2.10.

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Pascal Ehlert <pascal@xxxxxxxxxxxx>
Sent: 10 January 2021 18:06:18
To: ceph-users@xxxxxxx
Subject:  Snaptrim making cluster unusable

Hi all,

We are running a small cluster with three nodes and 6-8 OSDs each.
The OSDs are SSDs with sizes from 2 to 4 TB. Crush map is configured so
all data is replicated to each node.
The Ceph version is Ceph 15.2.6.

Today I deleted 4 Snapshots of the same two 400GB and 500GB rbd volumes.
Shortly after issuing the delete, I noticed the cluster became
unresponsive to an extend where almost all our services went down due
high IO latency.

After a while, I noticed about 20 active snaptrim tasks + another 200 or
so snaptrim_wait.

I tried setting
osd_snap_trim_sleep to 3,
osd_pg_max_concurrent_snap_trims to 1
rbd_balance_snap_reads to true,
rbd_localize_snap_reads to true

Still the only way to make the cluster responsive again was to set
osd_pg_max_concurrent_snap_trims to 0 and thus disable snaptrimming.
I tried a few other options, but whenever snaptrims are running for a
significant number of PGs, the cluster becomes completely unusable.

Are there any other options to throttle snaptrimming for that I haven't
tried, yet?

Thank you,

Pascal
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx