Re: Introduce flash OSD's to Nautilus installation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-09-17 18:36, Mathias Lindberg wrote:
> Hi,
> 
> We have a 1.2PB Nautilus installation primarily using CephFS for our
> HPC-resources.
> Our OSD’s have spinning disks and NvME devices for WAL and DB in an
> LVM-setup.
> 
> The CephFS metadata pool resides on spinning disks, and I wonder if
> there is any point from a performance perspective to put that on flash? 
> Trying to google that does not provide for a single straight answer,
> some say its is heavily cached on the MDS and does not benefit that much
> from flash while others argue it is significantly faster putting it on
> flash.

It depends on how many SSDs you want to add. And if you only use them
for cephfs metadata or not. If you have a lot of dirs / files in your
cluster, than you will also have a lot of OMAP / META data. If you put
that data on a few SSDs, that will become a bottleneck. The rocksdb
database will become large. On Mimic we have seen that this can become a
big issue. We moved all metadata to a subset (25%) of the cluster (NVMe)
and that did not work out well. So we ended up reverting the CRUSH
change. During this period your OSDs will consume a lot of CPU.

TL;DR: I don't think you want to do this and think it's better to have
the RocksDB data spread out accross the cluster. Especially as you
already have NVMe for WAL and DB. But maybe things have changed in
Nautilus, but I wouldn't bet on it.


> We would regardless like to introduce flash OSD’s.
> In order to do that and not have data moving on to the ssd’s OSD’s we
> add we need add crush map rules that place data on ssd and hdd exclusively .
> 
> But from reading previous posts to the list adding rules like that would
> trigger PG’s to start migrating.
> 
> But when I (reluctantly) manually edit the crush map, adding a new class
> ssd (using a previously unused id) and adding the new rules I need
> crushtool seems to think no data vill shuffle and that maps are
> equivalent. Is it that simple, or am i doing it wrong?

That could work:
https://blog.widodh.nl/2019/02/comparing-two-ceph-crush-maps/

If the crushtool doesn't see a difference it should work out.


> I would have preferred to have used some sort of tool like the crushtool
> reclassify (if applicable) to make these changes, but I can’t get the
> hang of that at all. 

Try it out on a (virtual) test cluster if you can (you can manually
assign device classes to OSDs). Better test and understand, than putting
yourself (and the cluster) under a lot of stress.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux