Re: Moving devices to a different device class?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Ah, our old friend the P5316.

A few things to remember about these:

* 64KB IU means that you'll burn through endurance if you do a lot of writes smaller than that.  The firmware will try to coalesce smaller writes, especially if they're sequential.  You probably want to keep your RGW / CephFS index / medata pools on other media.

* With Quincy or later and a reasonably recent kernel you can set bluestore_use_optimal_io_size_for_min_alloc_size to true and OSDs deployed on these should automatically be created with a 64KB min_alloc_size.  If you're writing a lot of objects smaller than, say, 256KB -- especially if using EC -- a more nuanced approach may be warranted.  ISTR that your data are large sequential files, so probably you can exploit this.  For sure you want these OSDs to not have the default 4KB min_alloc_size; that would result in lowered write performance and especially endurance burn.  The min_alloc_size cannot be changed after an OSD is created; instead one would need to destroy and recreate.

Optimizing RGW Object Storage Mixed Media through Storage Classes and Lua Scripting

> On Oct 24, 2023, at 11:42, Matt Larson <larsonmattr@xxxxxxxxx> wrote:
> I am looking to create a new pool that would be backed by a particular set
> of drives that are larger nVME SSDs (Intel SSDPF2NV153TZ, 15TB drives).
> Particularly, I am wondering about what is the best way to move devices
> from one pool and to direct them to be used in a new pool to be created. In
> this case, the documentation suggests I could want to assign them to a new
> device-class and have a placement rule that targets that device-class in
> the new pool.

If you're using cephadm / ceph orch you can craft an OSD spec that uses or ignores drives based on size or model.

Multiple pools can share OSDs, for your use-case though you probably don't want to.

> Currently the Ceph cluster has two device classes 'hdd' and 'ssd', and the
> larger 15TB drives were automatically assigned to the 'ssd' device class
> that is in use by a different pool. The `ssd` device classes are used in a
> placement rule targeting that class.

The names of device classes are actually semi-arbitrary.  The above distinction is made on the basis of whether or not the kernel believes a given device to rotate.

> The documentation describes that I could set a device class for an OSD with
> a command like:
> `ceph osd crush set-device-class CLASS OSD_ID [OSD_ID ..]`
> Class names can be arbitrary strings like 'big_nvme".  

or "qlc"

> Before setting a new
> device class to an OSD that already has an assigned device class, should
> use `ceph osd crush rm-device-class ssd osd.XX`.

Yep.  I suspect that's a guardrail to prevent inadvertently trampling.

> Can I proceed to directly remove these OSDs from the current device class
> and assign to a new device class?

Carpe NAND!

> Should they be moved one by one? What is
> the way to safely protect data from the existing pool that they are mapped
> to?

Are there other SSDs in said existing pool?  If you reassign all of these, will there be enough survivors to meet replication policy and hold all the data?

One by one would be safe.  Doing more than one might be faster and more efficient, depending on your hardware and topology.  For sure you don't want to reassign more than one per CRUSH failure domain at a time (host, rack, depends on your setup).  If your topology, RAM, and clients are amenable, you could do all OSDs in a single failure domain at once, then proceed to the next only after all PGs are active+clean.

> Thanks,
>  Matt
> -- 
> Matt Larson, PhD
> Madison, WI  53705 U.S.A.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]

  Powered by Linux