Re: CephFS metadata pool size

John Spray <jspray@xxxxxxxxxx> · Mon, 26 Sep 2016 10:14:17 +0100

On Mon, Sep 26, 2016 at 8:28 AM, David <dclistslinux@xxxxxxxxx> wrote:
> Ryan, a team at Ebay recently did some metadata testing, have a search on
> this list. Pretty sure they found there wasn't a huge benefit it putting the
> metadata pool on solid. As Christian says, it's all about ram and Cpu. You
> want to get as many inodes into cache as possible.

This is generally good advice, but we're not quite at the point of
saying using SSDs is not useful.  In the case where the working set
exceeds the MDS cache size, and the metadata access is random, the MDS
will generate a large quantity of small latency sensitive IOs to read
in metadata.  In that kind of case, having those reads going to
dedicated SSDs (as opposed to the same spindles as bulk data) may well
give better (and certainly more predictable) performance.

As always this is a relatively young area and empirical testing is needed.

John

>
>
> On 26 Sep 2016 2:09 a.m., "Christian Balzer" <chibi@xxxxxxx> wrote:
>>
>>
>> Hello,
>>
>> On Sun, 25 Sep 2016 19:51:25 -0400 (EDT) Tyler Bishop wrote:
>>
>> > 800TB of NVMe?  That sounds wonderful!
>> >
>> That's not what he wrote at all.
>> 800TB capacity, of which the meta-data will likely be a small fraction.
>>
>> As for the OP, try your google foo on the ML archives, this of course has
>> been discussed before.
>> See the "CephFS in the wild" thread 3 months ago for example.
>>
>> In short, you need to have an idea of the number of files and calculate
>> 2KB per object (file).
>> Plus some overhead for the underlying OSD FS, for the time being at least.
>>
>> And while having the meta-data pool on fast storage certainly won't hurt,
>> the consensus here seems to be that the CPU (few, fast cores) and RAM of
>> the MDS have a much higher priority/benefit.
>>
>> Christian
>> >
>> > ----- Original Message -----
>> > From: "Ryan Leimenstoll" <rleimens@xxxxxxxxxxxxxx>
>> > To: "ceph new" <ceph-users@xxxxxxxxxxxxxx>
>> > Sent: Saturday, September 24, 2016 5:37:08 PM
>> > Subject:  CephFS metadata pool size
>> >
>> > Hi all,
>> >
>> > We are in the process of expanding our current Ceph deployment (Jewel,
>> > 10.2.2) to incorporate CephFS for fast, network attached scratch storage. We
>> > are looking to have the metadata pool exist entirely on SSDs (or NVMe),
>> > however I am not sure how big to expect this pool to grow to. Is there any
>> > good rule of thumb or guidance to getting an estimate on this before
>> > purchasing hardware? We are expecting upwards of 800T usable capacity at the
>> > start.
>> >
>> > Thanks for any insight!
>> >
>> > Ryan Leimenstoll
>> > rleimens@xxxxxxxxxxxxxx
>> > University of Maryland Institute for Advanced Computer Studies
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>> --
>> Christian Balzer        Network/Systems Engineer
>> chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
>> http://www.gol.com/
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com