Re: Testing CephFS

"Simon Hallam" <sha@xxxxxxxxx> · Mon, 24 Aug 2015 10:35:10 +0000

Hi Greg,

The MDS' detect that the other one went down and started the replay. 

I did some further testing with 20 client machines. Of the 20 client machines, 5 hung with the following error:

[Aug24 10:53] ceph: mds0 caps stale
[Aug24 10:54] ceph: mds0 caps stale
[Aug24 10:58] ceph: mds0 hung
[Aug24 11:03] ceph: mds0 came back
[  +8.803334] libceph: mon2 10.15.0.3:6789 socket closed (con state OPEN)
[  +0.000018] libceph: mon2 10.15.0.3:6789 session lost, hunting for new mon
[Aug24 11:04] ceph: mds0 reconnect start
[  +0.084938] libceph: mon2 10.15.0.3:6789 session established
[  +0.008475] ceph: mds0 reconnect denied

10.15.0.3 was the active MDS at the time I unplugged the Ethernet cable.

This was the output of ceph -w as I ran the test (I've removed a lot of the pg remapping):

2015-08-24 11:02:39.547529 mon.1 [INF] mon.ceph2 calling new monitor election
2015-08-24 11:02:40.011995 mon.0 [INF] mon.ceph1 calling new monitor election
2015-08-24 11:02:45.245869 mon.0 [INF] mon.ceph1@0 won leader election with quorum 0,1
2015-08-24 11:02:45.257440 mon.0 [INF] HEALTH_WARN; 1 mons down, quorum 0,1 ceph1,ceph2
2015-08-24 11:02:45.535369 mon.0 [INF] monmap e1: 3 mons at {ceph1=10.15.0.1:6789/0,ceph2=10.15.0.2:6789/0,ceph3=10.15.0.3:6789/0}
2015-08-24 11:02:45.535444 mon.0 [INF] pgmap v15803: 8256 pgs: 8256 active+clean; 1248 GB data, 2503 GB used, 193 TB / 196 TB avail; 47 B/s wr, 0 op/s
2015-08-24 11:02:45.535541 mon.0 [INF] mdsmap e38: 1/1/1 up {0=ceph3=up:active}, 2 up:standby
2015-08-24 11:02:45.535629 mon.0 [INF] osdmap e197: 36 osds: 36 up, 36 in
2015-08-24 11:03:01.946397 mon.0 [INF] mdsmap e39: 1/1/1 up {0=ceph2=up:replay}, 1 up:standby
2015-08-24 11:03:02.993880 mon.0 [INF] mds.0 10.15.0.2:6849/17644 up:reconnect
2015-08-24 11:03:02.993930 mon.0 [INF] mdsmap e40: 1/1/1 up {0=ceph2=up:reconnect}, 1 up:standby
2015-08-24 11:03:51.461248 mon.0 [INF] mds.0 10.15.0.2:6849/17644 up:rejoin
2015-08-24 11:03:55.807131 mon.0 [INF] mds.0 10.15.0.2:6849/17644 up:active
2015-08-24 11:03:55.807195 mon.0 [INF] mdsmap e42: 1/1/1 up {0=ceph2=up:active}, 1 up:standby
2015-08-24 11:06:48.036736 mon.0 [INF] mds.0 10.15.0.2:6849/17644 up:active
2015-08-24 11:06:48.036799 mon.0 [INF] mdsmap e43: 1/1/1 up {0=ceph2=up:active}, 1 up:standby
*<cable plugged back in>*
2015-08-24 11:13:13.230714 mon.0 [INF] osd.32 10.15.0.3:6832/11565 boot
2015-08-24 11:13:13.230765 mon.0 [INF] osdmap e212: 36 osds: 25 up, 25 in
2015-08-24 11:13:13.230809 mon.0 [INF] mds.? 10.15.0.3:6833/16993 up:boot
2015-08-24 11:13:13.230837 mon.0 [INF] mdsmap e47: 1/1/1 up {0=ceph2=up:active}, 2 up:standby
2015-08-24 11:13:30.799429 mon.2 [INF] mon.ceph3 calling new monitor election
2015-08-24 11:13:30.826158 mon.0 [INF] mon.ceph1 calling new monitor election
2015-08-24 11:13:30.926331 mon.0 [INF] mon.ceph1@0 won leader election with quorum 0,1,2
2015-08-24 11:13:30.968739 mon.0 [INF] mdsmap e47: 1/1/1 up {0=ceph2=up:active}, 2 up:standby
2015-08-24 11:13:28.383203 mds.0 [INF] denied reconnect attempt (mds is up:active) from client.24155 10.10.10.95:0/3238635414 after 625.375507 (allowed interval 45)
2015-08-24 11:13:29.721653 mds.0 [INF] denied reconnect attempt (mds is up:active) from client.24146 10.10.10.99:0/3454703638 after 626.713952 (allowed interval 45)
2015-08-24 11:13:31.113004 mds.0 [INF] denied reconnect attempt (mds is up:active) from client.24140 10.10.10.60:0/359606080 after 628.105302 (allowed interval 45)
2015-08-24 11:13:50.933020 mds.0 [INF] denied reconnect attempt (mds is up:active) from client.24152 10.10.10.67:0/3475305031 after 647.925323 (allowed interval 45)
2015-08-24 11:13:51.037681 mds.0 [INF] denied reconnect attempt (mds is up:active) from client.24149 10.10.10.68:0/22416725 after 648.029988 (allowed interval 45)

I did just notice that none of the times match up. So may try again once I fix ntp/chrony and see if that makes a difference.

Cheers,

Simon

> -----Original Message-----
> From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx]
> Sent: 21 August 2015 12:16
> To: Simon Hallam
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Testing CephFS
> 
> On Thu, Aug 20, 2015 at 11:07 AM, Simon  Hallam <sha@xxxxxxxxx> wrote:
> > Hey all,
> >
> >
> >
> > We are currently testing CephFS on a small (3 node) cluster.
> >
> >
> >
> > The setup is currently:
> >
> >
> >
> > Each server has 12 OSDs, 1 Monitor and 1 MDS running on it:
> >
> > The servers are running: 0.94.2-0.el7
> >
> > The clients are running: Ceph: 0.80.10-1.fc21, Kernel: 4.0.6-200.fc21.x86_64
> >
> >
> >
> > ceph -s
> >
> >     cluster 4ed5ecdd-0c5b-4422-9d99-c9e42c6bd4cd
> >
> >      health HEALTH_OK
> >
> >      monmap e1: 3 mons at
> > {ceph1=10.15.0.1:6789/0,ceph2=10.15.0.2:6789/0,ceph3=10.15.0.3:6789/0}
> >
> >             election epoch 20, quorum 0,1,2 ceph1,ceph2,ceph3
> >
> >      mdsmap e12: 1/1/1 up {0=ceph3=up:active}, 2 up:standby
> >
> >      osdmap e389: 36 osds: 36 up, 36 in
> >
> >       pgmap v19370: 8256 pgs, 3 pools, 51217 MB data, 14035 objects
> >
> >             95526 MB used, 196 TB / 196 TB avail
> >
> >                 8256 active+clean
> >
> >
> >
> > Our Ceph.conf is relatively simple at the moment:
> >
> >
> >
> > cat /etc/ceph/ceph.conf
> >
> > [global]
> >
> > fsid = 4ed5ecdd-0c5b-4422-9d99-c9e42c6bd4cd
> >
> > mon_initial_members = ceph1, ceph2, ceph3
> >
> > mon_host = 10.15.0.1,10.15.0.2,10.15.0.3
> >
> > mon_pg_warn_max_per_osd = 1000
> >
> > auth_cluster_required = cephx
> >
> > auth_service_required = cephx
> >
> > auth_client_required = cephx
> >
> > filestore_xattr_use_omap = true
> >
> > osd_pool_default_size = 2
> >
> >
> >
> > When I pulled the plug on the master MDS last time (ceph1), it stopped all
> > IO until I plugged it back in. I was under the assumption that the MDS
> would
> > fail over the other 2 MDS’s and IO would continue?
> >
> >
> >
> > Is there something I need to do to allow the MDS’s to failover from each
> > other without too much interruption? Or is this because the clients ceph
> > version?
> 
> That's quite strange. How long did you wait for it to fail over? Did
> the output of "ceph -s" (or "ceph -w", whichever) change during that
> time?
> By default the monitors should have detected the MDS was dead after 30
> seconds and put one of the other MDS nodes into replay and active.
> 
> ...I wonder if this is because you lost a monitor at the same time as
> the MDS. What kind of logging do you have available from during your
> test?
> -Greg
> 
> >
> >
> >
> > Cheers,
> >
> >
> >
> > Simon Hallam
> >
> > Linux Support & Development Officer

Please visit our new website at www.pml.ac.uk and follow us on Twitter  @PlymouthMarine

Winner of the Environment & Conservation category, the Charity Awards 2014.

Plymouth Marine Laboratory (PML) is a company limited by guarantee registered in England & Wales, company number 4178503. Registered Charity No. 1091222. Registered Office: Prospect Place, The Hoe, Plymouth  PL1 3DH, UK. 

This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. You are reminded that e-mail communications are not secure and may contain viruses; PML accepts no liability for any loss or damage which may be caused by viruses.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com