Re: CephFS metadata pool to SSDs

Reed Dier <reed.dier@xxxxxxxxxxx> · Fri, 13 Oct 2017 10:31:27 -0500

As always, appreciate the help and knowledge of the collective ML mind.

> If you aren't using DC SSDs and this is prod, then I wouldn't recommend moving towards this model. 

These are Samsung SM863a’s and Micron 5100 MAXs, all roughly 6-12 months old, with the most worn drive showing 23 P/E cycles so far.

Thanks again,

Reed

> On Oct 12, 2017, at 4:18 PM, John Spray <jspray@xxxxxxxxxx> wrote:
> 
> On Thu, Oct 12, 2017 at 9:34 PM, Reed Dier <reed.dier@xxxxxxxxxxx> wrote:
>> I found an older ML entry from 2015 and not much else, mostly detailing the
>> doing performance testing to dispel poor performance numbers presented by
>> OP.
>> 
>> Currently have the metadata pool on my slow 24 HDDs, and am curious if I
>> should see any increased performance with CephFS by moving the metadata pool
>> onto SSD medium.
> 
> It depends a lot on the workload.
> 
> The primary advantage of moving metadata to dedicated drives
> (especially SSDs) is that it makes the system more deterministic under
> load.  The most benefit will be seen on systems which had previously
> had shared HDD OSDs that were fully saturated with data IO, and were
> consequently suffering from very slow metadata writes.
> 
> The impact will also depend on whether the metadata workload fit in
> the mds_cache_size or not: if the MDS is frequently missing its cache
> then the metadata pool latency will be more important.
> 
> On systems with plenty of spare IOPs, with non-latency-sensitive
> workloads, one might see little or no difference in performance when
> using SSDs, as those systems would typically bottleneck on the number
> of operations per second MDS daemon (CPU bound).  Systems like that
> would benefit more from multiple MDS daemons.
> 
> Then again, systems with plenty of spare IOPs can quickly become
> congested during recovery/backfill scenarios, so having SSDs for
> metadata is a nice risk mitigation to keep the system more responsive
> during bad times.
> 
>> My thought is that the SSDs are lower latency, and it removes those iops
>> from the slower spinning disks.
>> 
>> My next concern would be write amplification on the SSDs. Would this thrash
>> the SSD lifespan with tons of little writes or should it not be too heavy of
>> a workload to matter too much?
> 
> The MDS is comparatively efficient in how it writes out metadata:
> journal writes get batched up into larger IOs, and if something is
> frequently modified then it doesn't get written back every time (just
> when it falls off the end of the journal, or periodically).
> 
> If you've got SSDs that you're confident enough to use for data or
> general workloads, I wouldn't be too worried about using them for
> CephFS metadata.
> 
>> My last question from the operations standpoint, if I use:
>> # ceph osd pool set fs-metadata crush_ruleset <ssd ruleset>
>> Will this just start to backfill the metadata pool over to the SSDs until it
>> satisfies the crush requirements for size and failure domains and not skip a
>> beat?
> 
> On a healthy cluster, yes, this should just work.  The level of impact
> you see will depend on how much else you're trying to do with the
> system.  The prioritization of client IO vs. backfill IO has been
> improved in luminous, so you should use luminous if you can.
> 
> Because the overall size of the metadata pool is small, the smart
> thing is probably to find a time that is quiet for your system, and do
> the crush rule change at that time to get it over with quickly, rather
> than trying to do it during normal operations.
> 
> Cheers,
> John
> 
>> 
>> Obviously things like enabling dirfrags, and multiple MDS ranks will be more
>> likely to improve performance with CephFS, but the metadata pool uses very
>> little space, and I have the SSDs already, so I figured I would explore it
>> as an option.
>> 
>> Thanks,
>> 
>> Reed
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com