Re: Successfully using dm-cache

Frank Schilder <frans@xxxxxx> · Thu, 12 Sep 2024 14:08:45 +0000

Hi Michael,

yes, at least I'm interested. I also plan to use dm_cache and would like to hear about your continued experience.

I have a few specific questions about your set-up:

- Sizing: Why do you use only 85G for the cache? Do you also deploy OSDs on the remaining space on the NVMe or is it used for something else?

- Rotational flag: There was a longer thread about bcache. When an OSD gets created it is recognized as rotational by ceph. Then bcache is added and on the next restart, the OSD shows up as non-rotational. This leads to non-optimal bluestore parameters being applied and hits performance heavily. How is that in your set-up? Does ceph/the OS show the OSDs as rotational or not? How were they classified on creation?

- How deployed (lvm+dm-cache first, then OSD or other way around): This is related to the previous question. The order might be important. My guess is that after attaching the cache the rotational flag from the cache device is inherited. If this happens before OSD creation, the OSD will be created with SSD tunings and with HDD tunings otherwise. How did you do it?

- Why read-only, what cache-mode: I guess the main reason for read-only is the small size of the cache device compared with the 4T HDDs. What cache-mode did you select for dm_cache? Nevertheless: did you try to use a read-write mode?

- Power loss test: Did you do some power-loss tests to see if OSDs can end up corrupted?

Thanks for sharing your experience!

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Michael Lipp <mnl@xxxxxx>
Sent: Wednesday, January 31, 2024 6:23 PM
To: ceph-users@xxxxxxx
Subject:  Successfully using dm-cache

Just in case anybody is interested: Using dm-cache works and boosts
performance -- at least for my use case.

The "challenge" was to get 100 (identical) Linux-VMs started on a three
node hyperconverged cluster. The hardware is nothing special, each node
has a Supermicro server board with a single CPU with 24 cores and 4 x 4
TB hard disks. And there's that extra 1 TB NVMe...

I know that the general recommendation is to use the NVMe for WAL and
metadata, but this didn't seem appropriate for my use case and I'm still
not quite sure about failure scenarios with this configuration. So
instead I made each drive a logical volume (managed by an OSD) and added
85 GiB NVMe to each LV as read-only cache.

Each VM uses as system disk an RBD based on a snapshot from the master
image. The idea was that with this configuration, all VMs should share
most (actually almost all) of the data on their system disk and this
data should be available from the cache.

Well, it works. When booting the 100 VMs, almost all read operations are
satisfied from the cache. So I get close to NVMe speed but have payed
for conventional hard drives only (well, SSDs aren't that much more
expensive nowadays, but the hardware is 4 years old).

So, nothing sophisticated, but as I couldn't find anything about this
kind of setup, it might be of interest nevertheless.

  - Michael
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx