Re: Snaptrim making cluster unusable

Frank Schilder <frans@xxxxxx> · Sun, 10 Jan 2021 17:56:56 +0000

>> - do you have bluefs_buffered_io set to true
> No

Try setting it to true.

> Is there anything specific I can do to check the write cache configuration?

Yes, "smartctl -g wcache DEVICE" will tell you if writeback cache is disabled. If not, use "smartctl -s wcache=off DEVICE" to disable it. Note that this setting does not persist reboot. You will find a discussion about how to do that in the list.

With both changes, try to enable snaptrim again and report back.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Pascal Ehlert <pascal@xxxxxxxxxxxx>
Sent: 10 January 2021 18:47:40
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  Snaptrim making cluster unusable

Hi Frank,

Thanks for getting back!
> - ceph version
15.2.6 (now upgraded to 15.2.8 and I was able to reproduce the issue)
> - rbd image config (meta- and data pool the same/different?)
We are not using EC but regular replicated pools, so I assume meta and
data pool are the same?
> - how many PGs do the affected pools have
512 for a total of 20.95TB of data
> - how many PGs per OSD (as stated by ceph osd df tree)
Varying between ~80 to ~220 with the 4TB disks having roughly twice as
many as the 2TB disks
> - what type of SSDs, do they have power loss protection, is write cache disabled
Mixed Intel SSDs, one example being Intel® SSD D3-S4510 Series
If this becomes relevant, I can look up some of the exact model, but I
couldn't pinpoint specific OSDs that struggled

The disks are connected through standard Intel SATA controllers,
sometimes onboard.
Is there anything specific I can do to check the write cache configuration?

> - do you have bluefs_buffered_io set to true
No

Regards,

Pascal

> For comparison, we are running daily rolling snapshots on ca. 260 RBD images with separate replicated meta-data and 6+2 EC data pool without any issues. No parameters changed from default. Version is mimic-13.2.10.
>
>
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Pascal Ehlert <pascal@xxxxxxxxxxxx>
> Sent: 10 January 2021 18:06:18
> To: ceph-users@xxxxxxx
> Subject:  Snaptrim making cluster unusable
>
> Hi all,
>
> We are running a small cluster with three nodes and 6-8 OSDs each.
> The OSDs are SSDs with sizes from 2 to 4 TB. Crush map is configured so
> all data is replicated to each node.
> The Ceph version is Ceph 15.2.6.
>
> Today I deleted 4 Snapshots of the same two 400GB and 500GB rbd volumes.
> Shortly after issuing the delete, I noticed the cluster became
> unresponsive to an extend where almost all our services went down due
> high IO latency.
>
> After a while, I noticed about 20 active snaptrim tasks + another 200 or
> so snaptrim_wait.
>
> I tried setting
> osd_snap_trim_sleep to 3,
> osd_pg_max_concurrent_snap_trims to 1
> rbd_balance_snap_reads to true,
> rbd_localize_snap_reads to true
>
>
> Still the only way to make the cluster responsive again was to set
> osd_pg_max_concurrent_snap_trims to 0 and thus disable snaptrimming.
> I tried a few other options, but whenever snaptrims are running for a
> significant number of PGs, the cluster becomes completely unusable.
>
> Are there any other options to throttle snaptrimming for that I haven't
> tried, yet?
>
>
> Thank you,
>
> Pascal
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx