Re: Testing CephFS

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 21 Aug 2015 12:15:49 +0100

On Thu, Aug 20, 2015 at 11:07 AM, Simon  Hallam <sha@xxxxxxxxx> wrote:
> Hey all,
>
>
>
> We are currently testing CephFS on a small (3 node) cluster.
>
>
>
> The setup is currently:
>
>
>
> Each server has 12 OSDs, 1 Monitor and 1 MDS running on it:
>
> The servers are running: 0.94.2-0.el7
>
> The clients are running: Ceph: 0.80.10-1.fc21, Kernel: 4.0.6-200.fc21.x86_64
>
>
>
> ceph -s
>
>     cluster 4ed5ecdd-0c5b-4422-9d99-c9e42c6bd4cd
>
>      health HEALTH_OK
>
>      monmap e1: 3 mons at
> {ceph1=10.15.0.1:6789/0,ceph2=10.15.0.2:6789/0,ceph3=10.15.0.3:6789/0}
>
>             election epoch 20, quorum 0,1,2 ceph1,ceph2,ceph3
>
>      mdsmap e12: 1/1/1 up {0=ceph3=up:active}, 2 up:standby
>
>      osdmap e389: 36 osds: 36 up, 36 in
>
>       pgmap v19370: 8256 pgs, 3 pools, 51217 MB data, 14035 objects
>
>             95526 MB used, 196 TB / 196 TB avail
>
>                 8256 active+clean
>
>
>
> Our Ceph.conf is relatively simple at the moment:
>
>
>
> cat /etc/ceph/ceph.conf
>
> [global]
>
> fsid = 4ed5ecdd-0c5b-4422-9d99-c9e42c6bd4cd
>
> mon_initial_members = ceph1, ceph2, ceph3
>
> mon_host = 10.15.0.1,10.15.0.2,10.15.0.3
>
> mon_pg_warn_max_per_osd = 1000
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
> filestore_xattr_use_omap = true
>
> osd_pool_default_size = 2
>
>
>
> When I pulled the plug on the master MDS last time (ceph1), it stopped all
> IO until I plugged it back in. I was under the assumption that the MDS would
> fail over the other 2 MDS’s and IO would continue?
>
>
>
> Is there something I need to do to allow the MDS’s to failover from each
> other without too much interruption? Or is this because the clients ceph
> version?

That's quite strange. How long did you wait for it to fail over? Did
the output of "ceph -s" (or "ceph -w", whichever) change during that
time?
By default the monitors should have detected the MDS was dead after 30
seconds and put one of the other MDS nodes into replay and active.

...I wonder if this is because you lost a monitor at the same time as
the MDS. What kind of logging do you have available from during your
test?
-Greg

>
>
>
> Cheers,
>
>
>
> Simon Hallam
>
> Linux Support & Development Officer
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com