Re: Recommendation for decent write latency performance from HDDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Going to resurrect this thread to provide another option:

LVM-cache, ie putting a cache device in-front of the bluestore-LVM LV.

I only mention this because I noticed it in the SUSE documentation for SES6 (based on Nautilus) here: https://documentation.suse.com/ses/6/html/ses-all/lvmcache.html

  •  If you plan to use a fast drive as an LVM cache for multiple OSDs, be aware that all OSD operations (including replication) will go through the caching device. All reads will be queried from the caching device, and are only served from the slow device in case of a cache miss. Writes are always applied to the caching device first, and are flushed to the slow device at a later time ('writeback' is the default caching mode).
  • When deciding whether to utilize an LVM cache, verify whether the fast drive can serve as a front for multiple OSDs while still providing an acceptable amount of IOPS. You can test it by measuring the maximum amount of IOPS that the fast device can serve, and then dividing the result by the number of OSDs behind the fast device. If the result is lower or close to the maximum amount of IOPS that the OSD can provide without the cache, LVM cache is probably not suited for this setup.
  • The interaction of the LVM cache device with OSDs is important. Writes are periodically flushed from the caching device to the slow device. If the incoming traffic is sustained and significant, the caching device will struggle to keep up with incoming requests as well as the flushing process, resulting in performance drop. Unless the fast device can provide much more IOPS with better latency than the slow device, do not use LVM cache with a sustained high volume workload. Traffic in a burst pattern is more suited for LVM cache as it gives the cache time to flush its dirty data without interfering with client traffic. For a sustained low traffic workload, it is difficult to guess in advance whether using LVM cache will improve performance. The best test is to benchmark and compare the LVM cache setup against the WAL/DB setup. Moreover, as small writes are heavy on the WAL partition, it is suggested to use the fast device for the DB and/or WAL instead of an LVM cache.

So it sounds like you could partition your NVMe for either LVM-cache, DB/WAL, or both?

Just figured this sounded a bit more akin to what you were looking for in your original post and figured I would share.

I don't use this, but figured I would share it.

Reed

On Apr 4, 2020, at 9:12 AM, jesper@xxxxxxxx wrote:

Hi.

We have a need for "bulk" storage - but with decent write latencies.
Normally we would do this with a DAS with a Raid5 with 2GB Battery
backed write cache in front - As cheap as possible but still getting the
features of scalability of ceph.

In our "first" ceph cluster we did the same - just stuffed in BBWC
in the OSD nodes and we're fine - but now we're onto the next one and
systems like:
https://www.supermicro.com/en/products/system/1U/6119/SSG-6119P-ACR12N4L.cfm
Does not support a Raid controller like that - but is branded as for "Ceph
Storage Solutions".

It do however support 4 NVMe slots in the front - So - some level of
"tiering" using the NVMe drives should be what is "suggested" - but what
do people do? What is recommeneded. I see multiple options:

Ceph tiering at the "pool - layer":
https://docs.ceph.com/docs/master/rados/operations/cache-tiering/
And rumors that it is "deprectated:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2.0/html/release_notes/deprecated_functionality

Pro: Abstract layer
Con: Deprecated? - Lots of warnings?

Offloading the block.db on NVMe / SSD:
https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/

Pro: Easy to deal with - seem heavily supported.
Con: As far as I can tell - this will only benefit the metadata of the
osd- not actual data. Thus a data-commit to the osd til still be dominated
by the writelatency of the underlying - very slow HDD.

Bcache:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027713.html

Pro: Closest to the BBWC mentioned above - but with way-way larger cache
sizes.
Con: It is hard to see if I end up being the only one on the planet using
this
solution.

Eat it - Writes will be as slow as hitting dead-rust - anything that
cannot live
with that need to be entirely on SSD/NVMe.

Other?

Thanks for your input.

Jesper
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux