Re: cephfs change metadata pool?

Di Zhang <zhangdibio@xxxxxxxxx> · Tue, 12 Jul 2016 20:57:00 -0500

I am using 10G infiniband for cluster network and 1G ethernet for public. Because I don't have enough slots on the node, so I am using three files on the OS drive (SSD) for journaling, which really improved but not entirely solved the problem.
I am quite happy with the current IOPS, which range from 200 MB/s to 400 MB/s sequential write, depending on the block size. But the problem is, when I transfer data to the cephfs at a rate below 100MB/s, I can observe the slow/blocked requests warnings after a few minutes via "ceph -w". It's not specific to any particular OSDs. So I started to doubt if the configuration is correct or upgrading to Jewel can solve it.

There are about 5,000,000 objects currently in the cluster.

Thanks for the hints.

On Tue, Jul 12, 2016 at 8:19 PM, Christian Balzer <chibi@xxxxxxx> wrote:

Hello,

On Tue, 12 Jul 2016 19:54:38 -0500 Di Zhang wrote:

> It's a 5 nodes cluster. Each node has 3 OSDs. I set pg_num = 512 for both

> cephfs_data and cephfs_metadata. I experienced some slow/blocked requests

> issues when I was using hammer 0.94.x and prior. So I was thinking if the

> pg_num is too large for metadata.

Very, VERY much doubt this.

Your "ideal" values for a cluster of this size (are you planning to grow

it?) would be about 1024 PGs for data and 128 or 256 PGs for meta-data.

Not really that far off and more importantly not overloading the OSDs with

too many PGs in total. Or do you have more pools?

>I just upgraded the cluster to Jewel

> today. Will watch if the problem remains.

>

Jewel improvements might mask things, but I'd venture that your problems

were caused by your HW not being sufficient for the load.

As in, do you use SSD journals, etc?

How many IOPS do you need/expect from your CephFS?

How many objects are in there?

Christian

> Thank you.

>

> On Tue, Jul 12, 2016 at 6:45 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

>

> > I'm not at all sure that rados cppool actually captures everything (it

> > might). Doug has been working on some similar stuff for disaster

> > recovery testing and can probably walk you through moving over.

> >

> > But just how large *is* your metadata pool in relation to others?

> > Having a too-large pool doesn't cost much unless it's

> > grossly-inflated, and having a nice distribution of your folders is

> > definitely better than not.

> > -Greg

> >

> > On Tue, Jul 12, 2016 at 4:14 PM, Di Zhang <zhangdibio@xxxxxxxxx> wrote:

> > > Hi,

> > >

> > >     Is there any way to change the metadata pool for a cephfs without

> > losing

> > > any existing data? I know how to clone the metadata pool using rados

> > cppool.

> > > But the filesystem still links to the original metadata pool no matter

> > what

> > > you name it.

> > >

> > >     The motivation here is to decrease the pg_num of the metadata pool. I

> > > created this cephfs cluster sometime ago, while I didn't realize that I

> > > shouldn't assign a large pg_num to such a small pool.

> > >

> > >     I'm not sure if I can delete the fs and re-create it using the

> > existing

> > > data pool and the cloned metadata pool.

> > >

> > >     Thank you.

> > >

> > >

> > > Zhang Di

> > >

> > > _______________________________________________

> > > ceph-users mailing list

> > > ceph-users@xxxxxxxxxxxxxx

> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> > >

> >

--

Christian Balzer        Network/Systems Engineer

chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications

http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com