Re: New CRUSH device class questions

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Tue, 6 Aug 2019 23:40:24 -0700

On Tue, Aug 6, 2019 at 7:56 PM Konstantin Shalygin <k0ste@xxxxxxxx> wrote:

        Is it possible to add a new device class like 'metadata'?

      Yes, but you don't need this. Just use your existing class with
      another crush ruleset.

Maybe it's the lateness of the day, but I'm not sure how to do that. Do you have an example where all the OSDs are of class ssd?

        If I set the device class manually, will it be overwritten when the OSD
boots up?

      Nope. Classes assigned automatically when OSD is created, not
      boot'ed.

That's good to know. 

        I read https://ceph.com/community/new-luminous-crush-device-classes/ and it
mentions that Ceph automatically classifies into hdd, ssd, and nvme. Hence
the question.

    But it's not a magic. Sometimes drive can be sata ssd, but in
      kernel is 'rotational'...
I see, so it's not looking to see if the device is in /sys/class/pci or something. 

    We will still have 13 OSDs, it will be overkill for space for metadata, but
since Ceph lacks a reserve space feature, we don't have  many options. This
cluster is so fast that it can fill up in the blink of an eye.

    Not true. You always can set per-pool quota in bytes, for
      example:
    * your meta is 1G;
    * your raw space is 300G;
    * your data is 90G;
    Set quota to your data pool: `ceph osd pool set-quota
      <data_pool> max_bytes 96636762000`
Yes, we can set quotas to limit space usage (or number objects), but you can not reserve some space that other pools can't use. The problem is if we set a quota for the CephFS data pool to the equivalent of 95% there are at least two scenario that make that quota useless.

1. A host fails and the cluster recovers. The quota is now past the capacity of the cluster so if the data pool fills up, no pool can write.
2. The CephFS data pool is an erasure encoded pool, and it shares with a RGW data pool that is 3x rep. If more writes happen to the RGW data pool, then the quota will be past the capacity of the cluster.

Both of these cause metadata operations to not be committed and cause lots of problems with CephFS (can't list a directory with a broken inode in it). We would prefer to get a truncated file, then a broken file system.

I wrote a script that calculates 95% of the pool capacity and sets the quota if the current quota is 1% out of balance. This is run by cron every 5 minutes.

If there is a way to reserve some capacity for a pool that no other pool can use, please provide an example. Think of reserved inode space in ext4/XFS/etc.

Thank you.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com