Hi Michael, yes, at least I'm interested. I also plan to use dm_cache and would like to hear about your continued experience. I have a few specific questions about your set-up: - Sizing: Why do you use only 85G for the cache? Do you also deploy OSDs on the remaining space on the NVMe or is it used for something else? - Rotational flag: There was a longer thread about bcache. When an OSD gets created it is recognized as rotational by ceph. Then bcache is added and on the next restart, the OSD shows up as non-rotational. This leads to non-optimal bluestore parameters being applied and hits performance heavily. How is that in your set-up? Does ceph/the OS show the OSDs as rotational or not? How were they classified on creation? - How deployed (lvm+dm-cache first, then OSD or other way around): This is related to the previous question. The order might be important. My guess is that after attaching the cache the rotational flag from the cache device is inherited. If this happens before OSD creation, the OSD will be created with SSD tunings and with HDD tunings otherwise. How did you do it? - Why read-only, what cache-mode: I guess the main reason for read-only is the small size of the cache device compared with the 4T HDDs. What cache-mode did you select for dm_cache? Nevertheless: did you try to use a read-write mode? - Power loss test: Did you do some power-loss tests to see if OSDs can end up corrupted? Thanks for sharing your experience! Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Michael Lipp <mnl@xxxxxx> Sent: Wednesday, January 31, 2024 6:23 PM To: ceph-users@xxxxxxx Subject: Successfully using dm-cache Just in case anybody is interested: Using dm-cache works and boosts performance -- at least for my use case. The "challenge" was to get 100 (identical) Linux-VMs started on a three node hyperconverged cluster. The hardware is nothing special, each node has a Supermicro server board with a single CPU with 24 cores and 4 x 4 TB hard disks. And there's that extra 1 TB NVMe... I know that the general recommendation is to use the NVMe for WAL and metadata, but this didn't seem appropriate for my use case and I'm still not quite sure about failure scenarios with this configuration. So instead I made each drive a logical volume (managed by an OSD) and added 85 GiB NVMe to each LV as read-only cache. Each VM uses as system disk an RBD based on a snapshot from the master image. The idea was that with this configuration, all VMs should share most (actually almost all) of the data on their system disk and this data should be available from the cache. Well, it works. When booting the 100 VMs, almost all read operations are satisfied from the cache. So I get close to NVMe speed but have payed for conventional hard drives only (well, SSDs aren't that much more expensive nowadays, but the hardware is 4 years old). So, nothing sophisticated, but as I couldn't find anything about this kind of setup, it might be of interest nevertheless. - Michael _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx