Re: CephFS metadata pool to SSDs

John Spray <jspray@xxxxxxxxxx> · Thu, 12 Oct 2017 22:18:42 +0100

On Thu, Oct 12, 2017 at 9:34 PM, Reed Dier <reed.dier@xxxxxxxxxxx> wrote:
> I found an older ML entry from 2015 and not much else, mostly detailing the
> doing performance testing to dispel poor performance numbers presented by
> OP.
>
> Currently have the metadata pool on my slow 24 HDDs, and am curious if I
> should see any increased performance with CephFS by moving the metadata pool
> onto SSD medium.

It depends a lot on the workload.

The primary advantage of moving metadata to dedicated drives
(especially SSDs) is that it makes the system more deterministic under
load.  The most benefit will be seen on systems which had previously
had shared HDD OSDs that were fully saturated with data IO, and were
consequently suffering from very slow metadata writes.

The impact will also depend on whether the metadata workload fit in
the mds_cache_size or not: if the MDS is frequently missing its cache
then the metadata pool latency will be more important.

On systems with plenty of spare IOPs, with non-latency-sensitive
workloads, one might see little or no difference in performance when
using SSDs, as those systems would typically bottleneck on the number
of operations per second MDS daemon (CPU bound).  Systems like that
would benefit more from multiple MDS daemons.

Then again, systems with plenty of spare IOPs can quickly become
congested during recovery/backfill scenarios, so having SSDs for
metadata is a nice risk mitigation to keep the system more responsive
during bad times.

> My thought is that the SSDs are lower latency, and it removes those iops
> from the slower spinning disks.
>
> My next concern would be write amplification on the SSDs. Would this thrash
> the SSD lifespan with tons of little writes or should it not be too heavy of
> a workload to matter too much?

The MDS is comparatively efficient in how it writes out metadata:
journal writes get batched up into larger IOs, and if something is
frequently modified then it doesn't get written back every time (just
when it falls off the end of the journal, or periodically).

If you've got SSDs that you're confident enough to use for data or
general workloads, I wouldn't be too worried about using them for
CephFS metadata.

> My last question from the operations standpoint, if I use:
> # ceph osd pool set fs-metadata crush_ruleset <ssd ruleset>
> Will this just start to backfill the metadata pool over to the SSDs until it
> satisfies the crush requirements for size and failure domains and not skip a
> beat?

On a healthy cluster, yes, this should just work.  The level of impact
you see will depend on how much else you're trying to do with the
system.  The prioritization of client IO vs. backfill IO has been
improved in luminous, so you should use luminous if you can.

Because the overall size of the metadata pool is small, the smart
thing is probably to find a time that is quiet for your system, and do
the crush rule change at that time to get it over with quickly, rather
than trying to do it during normal operations.

Cheers,
John

>
> Obviously things like enabling dirfrags, and multiple MDS ranks will be more
> likely to improve performance with CephFS, but the metadata pool uses very
> little space, and I have the SSDs already, so I figured I would explore it
> as an option.
>
> Thanks,
>
> Reed
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com