Re: Snaptrim making cluster unusable

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sun, 10 Jan 2021 16:01:24 -0800

When the below was first published my team tried to reproduce, and couldn’t.  A couple of factors likely contribute to differing behavior:

* _Micron 5100_ for example isn’t a model, the 5100 _Eco_, _Pro_, and _Max_ are different beasts.  Similarly, implementation and firmware details vary by drive _size_ as well.  The moral of the story is to be careful extrapolating an experience with one specific drive to others models that one might assume are equivalent but aren’t.

* For SAS/SATA drives the HBA in use may be significant factor as well

— aad

> 
> SSDs are not equal to high performance: https://yourcmc.ru/wiki/index.php?title=Ceph_performance&mobileaction=toggle_view_desktop
> 
> Depending on your model, performance can be very poor.
> 
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> ________________________________________
> From: Pascal Ehlert <pascal@xxxxxxxxxxxx>
> Sent: 10 January 2021 19:19:09
> To: Frank Schilder
> Cc: ceph-users@xxxxxxx
> Subject: Re:  Re: Snaptrim making cluster unusable
> 
> I made the suggested changes.
> 
> (Un)fortunately I am not able to reproduce the issue anymore. Neither
> with the original settings nor the updated setting.
> This may be due to the fact that the problematic snapshots have been
> removed/trimmed now.
> When I make new snapshots of the same volumes, they are (obviously)
> trimmed in a few seconds without an impact on performance.
> 
> I will try to reproduce this again by artificially boosting the snapshot
> size.
> 
> For now, would you mind explaining if and why disabling the write cache
> is a good idea in general?
> It feels that having too many layers of cache can be detrimental and I'd
> leave it disable then.
> 
> Thank you very much!
> 
> Pascal
> 
> 
> 
> Frank Schilder wrote on 10.01.21 18:56:
>>>> - do you have bluefs_buffered_io set to true
>>> No
>> Try setting it to true.
>> 
>>> Is there anything specific I can do to check the write cache configuration?
>> Yes, "smartctl -g wcache DEVICE" will tell you if writeback cache is disabled. If not, use "smartctl -s wcache=off DEVICE" to disable it. Note that this setting does not persist reboot. You will find a discussion about how to do that in the list.
>> 
>> With both changes, try to enable snaptrim again and report back.
>> 
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>> 
>> ________________________________________
>> From: Pascal Ehlert <pascal@xxxxxxxxxxxx>
>> Sent: 10 January 2021 18:47:40
>> To: Frank Schilder
>> Cc: ceph-users@xxxxxxx
>> Subject: Re:  Snaptrim making cluster unusable
>> 
>> Hi Frank,
>> 
>> Thanks for getting back!
>>> - ceph version
>> 15.2.6 (now upgraded to 15.2.8 and I was able to reproduce the issue)
>>> - rbd image config (meta- and data pool the same/different?)
>> We are not using EC but regular replicated pools, so I assume meta and
>> data pool are the same?
>>> - how many PGs do the affected pools have
>> 512 for a total of 20.95TB of data
>>> - how many PGs per OSD (as stated by ceph osd df tree)
>> Varying between ~80 to ~220 with the 4TB disks having roughly twice as
>> many as the 2TB disks
>>> - what type of SSDs, do they have power loss protection, is write cache disabled
>> Mixed Intel SSDs, one example being Intel® SSD D3-S4510 Series
>> If this becomes relevant, I can look up some of the exact model, but I
>> couldn't pinpoint specific OSDs that struggled
>> 
>> The disks are connected through standard Intel SATA controllers,
>> sometimes onboard.
>> Is there anything specific I can do to check the write cache configuration?
>> 
>>> - do you have bluefs_buffered_io set to true
>> No
>> 
>> 
>> 
>> Regards,
>> 
>> Pascal
>> 
>> 
>> 
>>> For comparison, we are running daily rolling snapshots on ca. 260 RBD images with separate replicated meta-data and 6+2 EC data pool without any issues. No parameters changed from default. Version is mimic-13.2.10.
>>> 
>>> 
>>> =================
>>> Frank Schilder
>>> AIT Risø Campus
>>> Bygning 109, rum S14
>>> 
>>> ________________________________________
>>> From: Pascal Ehlert <pascal@xxxxxxxxxxxx>
>>> Sent: 10 January 2021 18:06:18
>>> To: ceph-users@xxxxxxx
>>> Subject:  Snaptrim making cluster unusable
>>> 
>>> Hi all,
>>> 
>>> We are running a small cluster with three nodes and 6-8 OSDs each.
>>> The OSDs are SSDs with sizes from 2 to 4 TB. Crush map is configured so
>>> all data is replicated to each node.
>>> The Ceph version is Ceph 15.2.6.
>>> 
>>> Today I deleted 4 Snapshots of the same two 400GB and 500GB rbd volumes.
>>> Shortly after issuing the delete, I noticed the cluster became
>>> unresponsive to an extend where almost all our services went down due
>>> high IO latency.
>>> 
>>> After a while, I noticed about 20 active snaptrim tasks + another 200 or
>>> so snaptrim_wait.
>>> 
>>> I tried setting
>>> osd_snap_trim_sleep to 3,
>>> osd_pg_max_concurrent_snap_trims to 1
>>> rbd_balance_snap_reads to true,
>>> rbd_localize_snap_reads to true
>>> 
>>> 
>>> Still the only way to make the cluster responsive again was to set
>>> osd_pg_max_concurrent_snap_trims to 0 and thus disable snaptrimming.
>>> I tried a few other options, but whenever snaptrims are running for a
>>> significant number of PGs, the cluster becomes completely unusable.
>>> 
>>> Are there any other options to throttle snaptrimming for that I haven't
>>> tried, yet?
>>> 
>>> 
>>> Thank you,
>>> 
>>> Pascal
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx