On Mon, Aug 24, 2015 at 12:53 PM, Simon Hallam <sha@xxxxxxxxx> wrote: > The clients are: > [root@gridnode50 ~]# uname -a > Linux gridnode50 4.0.8-200.fc21.x86_64 #1 SMP Fri Jul 10 21:09:54 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux > [root@gridnode50 ~]# ceph -v > ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) > > I don't think it is a reconnect timeout, as they don't even attempt to reconnect until I plug the Ethernet cable back into the original MDS? Right, the default timeout is going to be 15 seconds. But they clients should be getting MDSMap updates from the monitor that tell them the MDS has failed over. It looks like they're not timing out, then when the MDS *does* come back it tells them about its own death. Is that possible, Zheng? -Greg > > Cheers, > > Simon > >> -----Original Message----- >> From: Yan, Zheng [mailto:zyan@xxxxxxxxxx] >> Sent: 24 August 2015 12:28 >> To: Simon Hallam >> Cc: ceph-users@xxxxxxxxxxxxxx; Gregory Farnum >> Subject: Re: Testing CephFS >> >> >> > On Aug 24, 2015, at 18:38, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> > >> > On Mon, Aug 24, 2015 at 11:35 AM, Simon Hallam <sha@xxxxxxxxx> wrote: >> >> Hi Greg, >> >> >> >> The MDS' detect that the other one went down and started the replay. >> >> >> >> I did some further testing with 20 client machines. Of the 20 client >> machines, 5 hung with the following error: >> >> >> >> [Aug24 10:53] ceph: mds0 caps stale >> >> [Aug24 10:54] ceph: mds0 caps stale >> >> [Aug24 10:58] ceph: mds0 hung >> >> [Aug24 11:03] ceph: mds0 came back >> >> [ +8.803334] libceph: mon2 10.15.0.3:6789 socket closed (con state OPEN) >> >> [ +0.000018] libceph: mon2 10.15.0.3:6789 session lost, hunting for new >> mon >> >> [Aug24 11:04] ceph: mds0 reconnect start >> >> [ +0.084938] libceph: mon2 10.15.0.3:6789 session established >> >> [ +0.008475] ceph: mds0 reconnect denied >> > >> > Oh, this might be a kernel bug, failing to ask for mdsmap updates when >> > the connection goes away. Zheng, does that sound familiar? >> > -Greg >> >> This seems like reconnect timeout. you can try enlarging >> mds_reconnect_timeout config option. >> >> Which version of kernel are you using? >> >> Yan, Zheng >> >> > >> >> >> >> 10.15.0.3 was the active MDS at the time I unplugged the Ethernet cable. >> >> >> >> >> >> This was the output of ceph -w as I ran the test (I've removed a lot of the >> pg remapping): >> >> >> >> 2015-08-24 11:02:39.547529 mon.1 [INF] mon.ceph2 calling new monitor >> election >> >> 2015-08-24 11:02:40.011995 mon.0 [INF] mon.ceph1 calling new monitor >> election >> >> 2015-08-24 11:02:45.245869 mon.0 [INF] mon.ceph1@0 won leader >> election with quorum 0,1 >> >> 2015-08-24 11:02:45.257440 mon.0 [INF] HEALTH_WARN; 1 mons down, >> quorum 0,1 ceph1,ceph2 >> >> 2015-08-24 11:02:45.535369 mon.0 [INF] monmap e1: 3 mons at >> {ceph1=10.15.0.1:6789/0,ceph2=10.15.0.2:6789/0,ceph3=10.15.0.3:6789/0} >> >> 2015-08-24 11:02:45.535444 mon.0 [INF] pgmap v15803: 8256 pgs: 8256 >> active+clean; 1248 GB data, 2503 GB used, 193 TB / 196 TB avail; 47 B/s wr, 0 >> op/s >> >> 2015-08-24 11:02:45.535541 mon.0 [INF] mdsmap e38: 1/1/1 up >> {0=ceph3=up:active}, 2 up:standby >> >> 2015-08-24 11:02:45.535629 mon.0 [INF] osdmap e197: 36 osds: 36 up, 36 >> in >> >> 2015-08-24 11:03:01.946397 mon.0 [INF] mdsmap e39: 1/1/1 up >> {0=ceph2=up:replay}, 1 up:standby >> >> 2015-08-24 11:03:02.993880 mon.0 [INF] mds.0 10.15.0.2:6849/17644 >> up:reconnect >> >> 2015-08-24 11:03:02.993930 mon.0 [INF] mdsmap e40: 1/1/1 up >> {0=ceph2=up:reconnect}, 1 up:standby >> >> 2015-08-24 11:03:51.461248 mon.0 [INF] mds.0 10.15.0.2:6849/17644 >> up:rejoin >> >> 2015-08-24 11:03:55.807131 mon.0 [INF] mds.0 10.15.0.2:6849/17644 >> up:active >> >> 2015-08-24 11:03:55.807195 mon.0 [INF] mdsmap e42: 1/1/1 up >> {0=ceph2=up:active}, 1 up:standby >> >> 2015-08-24 11:06:48.036736 mon.0 [INF] mds.0 10.15.0.2:6849/17644 >> up:active >> >> 2015-08-24 11:06:48.036799 mon.0 [INF] mdsmap e43: 1/1/1 up >> {0=ceph2=up:active}, 1 up:standby >> >> *<cable plugged back in>* >> >> 2015-08-24 11:13:13.230714 mon.0 [INF] osd.32 10.15.0.3:6832/11565 boot >> >> 2015-08-24 11:13:13.230765 mon.0 [INF] osdmap e212: 36 osds: 25 up, 25 >> in >> >> 2015-08-24 11:13:13.230809 mon.0 [INF] mds.? 10.15.0.3:6833/16993 >> up:boot >> >> 2015-08-24 11:13:13.230837 mon.0 [INF] mdsmap e47: 1/1/1 up >> {0=ceph2=up:active}, 2 up:standby >> >> 2015-08-24 11:13:30.799429 mon.2 [INF] mon.ceph3 calling new monitor >> election >> >> 2015-08-24 11:13:30.826158 mon.0 [INF] mon.ceph1 calling new monitor >> election >> >> 2015-08-24 11:13:30.926331 mon.0 [INF] mon.ceph1@0 won leader >> election with quorum 0,1,2 >> >> 2015-08-24 11:13:30.968739 mon.0 [INF] mdsmap e47: 1/1/1 up >> {0=ceph2=up:active}, 2 up:standby >> >> 2015-08-24 11:13:28.383203 mds.0 [INF] denied reconnect attempt (mds is >> up:active) from client.24155 10.10.10.95:0/3238635414 after 625.375507 >> (allowed interval 45) >> >> 2015-08-24 11:13:29.721653 mds.0 [INF] denied reconnect attempt (mds is >> up:active) from client.24146 10.10.10.99:0/3454703638 after 626.713952 >> (allowed interval 45) >> >> 2015-08-24 11:13:31.113004 mds.0 [INF] denied reconnect attempt (mds is >> up:active) from client.24140 10.10.10.60:0/359606080 after 628.105302 >> (allowed interval 45) >> >> 2015-08-24 11:13:50.933020 mds.0 [INF] denied reconnect attempt (mds is >> up:active) from client.24152 10.10.10.67:0/3475305031 after 647.925323 >> (allowed interval 45) >> >> 2015-08-24 11:13:51.037681 mds.0 [INF] denied reconnect attempt (mds is >> up:active) from client.24149 10.10.10.68:0/22416725 after 648.029988 >> (allowed interval 45) >> >> >> >> I did just notice that none of the times match up. So may try again once I >> fix ntp/chrony and see if that makes a difference. >> >> >> >> Cheers, >> >> >> >> Simon >> >> >> >>> -----Original Message----- >> >>> From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx] >> >>> Sent: 21 August 2015 12:16 >> >>> To: Simon Hallam >> >>> Cc: ceph-users@xxxxxxxxxxxxxx >> >>> Subject: Re: Testing CephFS >> >>> >> >>> On Thu, Aug 20, 2015 at 11:07 AM, Simon Hallam <sha@xxxxxxxxx> >> wrote: >> >>>> Hey all, >> >>>> >> >>>> >> >>>> >> >>>> We are currently testing CephFS on a small (3 node) cluster. >> >>>> >> >>>> >> >>>> >> >>>> The setup is currently: >> >>>> >> >>>> >> >>>> >> >>>> Each server has 12 OSDs, 1 Monitor and 1 MDS running on it: >> >>>> >> >>>> The servers are running: 0.94.2-0.el7 >> >>>> >> >>>> The clients are running: Ceph: 0.80.10-1.fc21, Kernel: 4.0.6- >> 200.fc21.x86_64 >> >>>> >> >>>> >> >>>> >> >>>> ceph -s >> >>>> >> >>>> cluster 4ed5ecdd-0c5b-4422-9d99-c9e42c6bd4cd >> >>>> >> >>>> health HEALTH_OK >> >>>> >> >>>> monmap e1: 3 mons at >> >>>> >> {ceph1=10.15.0.1:6789/0,ceph2=10.15.0.2:6789/0,ceph3=10.15.0.3:6789/0} >> >>>> >> >>>> election epoch 20, quorum 0,1,2 ceph1,ceph2,ceph3 >> >>>> >> >>>> mdsmap e12: 1/1/1 up {0=ceph3=up:active}, 2 up:standby >> >>>> >> >>>> osdmap e389: 36 osds: 36 up, 36 in >> >>>> >> >>>> pgmap v19370: 8256 pgs, 3 pools, 51217 MB data, 14035 objects >> >>>> >> >>>> 95526 MB used, 196 TB / 196 TB avail >> >>>> >> >>>> 8256 active+clean >> >>>> >> >>>> >> >>>> >> >>>> Our Ceph.conf is relatively simple at the moment: >> >>>> >> >>>> >> >>>> >> >>>> cat /etc/ceph/ceph.conf >> >>>> >> >>>> [global] >> >>>> >> >>>> fsid = 4ed5ecdd-0c5b-4422-9d99-c9e42c6bd4cd >> >>>> >> >>>> mon_initial_members = ceph1, ceph2, ceph3 >> >>>> >> >>>> mon_host = 10.15.0.1,10.15.0.2,10.15.0.3 >> >>>> >> >>>> mon_pg_warn_max_per_osd = 1000 >> >>>> >> >>>> auth_cluster_required = cephx >> >>>> >> >>>> auth_service_required = cephx >> >>>> >> >>>> auth_client_required = cephx >> >>>> >> >>>> filestore_xattr_use_omap = true >> >>>> >> >>>> osd_pool_default_size = 2 >> >>>> >> >>>> >> >>>> >> >>>> When I pulled the plug on the master MDS last time (ceph1), it stopped >> all >> >>>> IO until I plugged it back in. I was under the assumption that the MDS >> >>> would >> >>>> fail over the other 2 MDS’s and IO would continue? >> >>>> >> >>>> >> >>>> >> >>>> Is there something I need to do to allow the MDS’s to failover from >> each >> >>>> other without too much interruption? Or is this because the clients >> ceph >> >>>> version? >> >>> >> >>> That's quite strange. How long did you wait for it to fail over? Did >> >>> the output of "ceph -s" (or "ceph -w", whichever) change during that >> >>> time? >> >>> By default the monitors should have detected the MDS was dead after >> 30 >> >>> seconds and put one of the other MDS nodes into replay and active. >> >>> >> >>> ...I wonder if this is because you lost a monitor at the same time as >> >>> the MDS. What kind of logging do you have available from during your >> >>> test? >> >>> -Greg >> >>> >> >>>> >> >>>> >> >>>> >> >>>> Cheers, >> >>>> >> >>>> >> >>>> >> >>>> Simon Hallam >> >>>> >> >>>> Linux Support & Development Officer >> >> >> >> >> >> Please visit our new website at www.pml.ac.uk and follow us on Twitter >> @PlymouthMarine >> >> >> >> Winner of the Environment & Conservation category, the Charity Awards >> 2014. >> >> >> >> Plymouth Marine Laboratory (PML) is a company limited by guarantee >> registered in England & Wales, company number 4178503. Registered Charity >> No. 1091222. Registered Office: Prospect Place, The Hoe, Plymouth PL1 3DH, >> UK. >> >> >> >> This message is private and confidential. If you have received this >> message in error, please notify the sender and remove it from your system. >> You are reminded that e-mail communications are not secure and may >> contain viruses; PML accepts no liability for any loss or damage which may be >> caused by viruses. >> >> > > > > Please visit our new website at www.pml.ac.uk and follow us on Twitter @PlymouthMarine > > Winner of the Environment & Conservation category, the Charity Awards 2014. > > Plymouth Marine Laboratory (PML) is a company limited by guarantee registered in England & Wales, company number 4178503. Registered Charity No. 1091222. Registered Office: Prospect Place, The Hoe, Plymouth PL1 3DH, UK. > > This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. You are reminded that e-mail communications are not secure and may contain viruses; PML accepts no liability for any loss or damage which may be caused by viruses. > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com