On 2020-09-17 18:36, Mathias Lindberg wrote: > Hi, > > We have a 1.2PB Nautilus installation primarily using CephFS for our > HPC-resources. > Our OSD’s have spinning disks and NvME devices for WAL and DB in an > LVM-setup. > > The CephFS metadata pool resides on spinning disks, and I wonder if > there is any point from a performance perspective to put that on flash? > Trying to google that does not provide for a single straight answer, > some say its is heavily cached on the MDS and does not benefit that much > from flash while others argue it is significantly faster putting it on > flash. It depends on how many SSDs you want to add. And if you only use them for cephfs metadata or not. If you have a lot of dirs / files in your cluster, than you will also have a lot of OMAP / META data. If you put that data on a few SSDs, that will become a bottleneck. The rocksdb database will become large. On Mimic we have seen that this can become a big issue. We moved all metadata to a subset (25%) of the cluster (NVMe) and that did not work out well. So we ended up reverting the CRUSH change. During this period your OSDs will consume a lot of CPU. TL;DR: I don't think you want to do this and think it's better to have the RocksDB data spread out accross the cluster. Especially as you already have NVMe for WAL and DB. But maybe things have changed in Nautilus, but I wouldn't bet on it. > We would regardless like to introduce flash OSD’s. > In order to do that and not have data moving on to the ssd’s OSD’s we > add we need add crush map rules that place data on ssd and hdd exclusively . > > But from reading previous posts to the list adding rules like that would > trigger PG’s to start migrating. > > But when I (reluctantly) manually edit the crush map, adding a new class > ssd (using a previously unused id) and adding the new rules I need > crushtool seems to think no data vill shuffle and that maps are > equivalent. Is it that simple, or am i doing it wrong? That could work: https://blog.widodh.nl/2019/02/comparing-two-ceph-crush-maps/ If the crushtool doesn't see a difference it should work out. > I would have preferred to have used some sort of tool like the crushtool > reclassify (if applicable) to make these changes, but I can’t get the > hang of that at all. Try it out on a (virtual) test cluster if you can (you can manually assign device classes to OSDs). Better test and understand, than putting yourself (and the cluster) under a lot of stress. Gr. Stefan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx