Re: metadata server rejoin time

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 7 Jul 2015 14:56:40 +0100

On Thu, Jul 2, 2015 at 11:38 AM, Matteo Dacrema <mdacrema@xxxxxxxx> wrote:
> Hi all,
>
> I'm using CephFS on Hammer and I've 1.5 million files , 2 metadata servers
> in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of
> RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM.
> I've noticed that if I kill the active metadata server the second one took
> about 10 to 30 minutes to switch from rejoin to active state.On the rejoin
> server while that is in rejoin state I can see ceph allocating RAM.

Do you have example "ceph -s" or "ceph -w" output? There are a few
things that could be making it take a while to finish restarting, but
I don't think it should be stuck in the rejoin state.

Also, how are your clients mounting CephFS?

>
>
> Here my configuration:
>
> [global]
>         fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
>         mon_initial_members = cephmds01
>         mon_host = 10.29.81.161
>         auth_cluster_required = cephx
>         auth_service_required = cephx
>         auth_client_required = cephx
>         public network = 10.29.81.0/24
>         tcp nodelay = true
>         tcp rcvbuf = 0
>         ms tcp read timeout = 600
>
>         #Capacity
>         mon osd full ratio = .95
>         mon osd nearfull ratio = .85
>
>
> [osd]
>         osd journal size = 1024
>         journal dio = true
>         journal aio = true
>
>         osd op threads = 2
>         osd op thread timeout = 60
>         osd disk threads = 2
>         osd recovery threads = 1
>         osd recovery max active = 1
>         osd max backfills = 2
>
>
>         # Pool
>         osd pool default size = 2
>
>         #XFS
>         osd mkfs type = xfs
>         osd mkfs options xfs = "-f -i size=2048"
>         osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog"
>
>         #FileStore Settings
>         filestore xattr use omap = false
>         filestore max inline xattr size = 512
>         filestore max sync interval = 10
>         filestore merge threshold = 40
>         filestore split multiple = 8
>         filestore flusher = false
>         filestore queue max ops = 2000
>         filestore queue max bytes = 536870912
>         filestore queue committing max ops = 500
>         filestore queue committing max bytes = 268435456
>         filestore op threads = 2
>
> [mds]
>         max mds = 1
>         mds cache size = 250000
>         client cache size = 1024

This particular value is only interpreted by userspace clients; it
doesn't do anything inside of the [mds] section.

>         mds dir commit ratio = 0.5
>
> Best regards,
> Matteo
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com