Hi Nick, thank you for your reply ! Indeed, jumbo frames was not activated. So ping and all was working, so i thought network is up. But not with enough mtu... The f... supermicro switch just deleted the switch config, so i had to recreate all and forgot about the MTU on the uplink ports. Thank you ! -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:info@xxxxxxxxxxxxxxxxx Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 30.09.2016 um 15:46 schrieb Nick Fisk: >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Oliver Dzombic >> Sent: 30 September 2016 14:16 >> To: ceph-users@xxxxxxxxxxxxxx >> Subject: production cluster down :( >> >> Hi, >> >> we have: >> >> ceph version 10.2.2 >> >> health HEALTH_ERR >> 2240 pgs are stuck inactive for more than 300 seconds >> 273 pgs down >> 2240 pgs peering >> 2240 pgs stuck inactive >> 354 requests are blocked > 32 sec >> mds cluster is degraded >> monmap e1: 3 mons at >> {cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0} >> election epoch 146, quorum 0,1,2 cephmon1,cephmon2,cephmon3 >> fsmap e114: 1/1/1 up {0=cephmon1=up:replay} >> osdmap e2322: 24 osds: 24 up, 24 in; 2230 remapped pgs >> flags sortbitwise >> pgmap v8774321: 2240 pgs, 4 pools, 9997 GB data, 2629 kobjects >> 34753 GB used, 19173 GB / 53926 GB avail >> 1957 remapped+peering >> 273 down+remapped+peering >> 10 peering >> >> >> health detail: >> >> http://pastebin.com/GsQcG2U0 >> >> >> Sample log from one OSD: >> >> >> >> 2016-09-30 15:01:07.066632 7f2b65d70700 0 log_channel(cluster) log [WRN] : 2 slow requests, 1 included below; oldest blocked for > >> 659.155019 secs >> 2016-09-30 15:01:07.066643 7f2b65d70700 0 log_channel(cluster) log [WRN] : slow request 480.599877 seconds old, received at 2016- >> 09-30 >> 14:53:06.466705: osd_op(mds.0.114:4 5.64e96f8f (undecoded) >> ack+read+known_if_redirected+full_force e2320) currently waiting for >> ack+read+peered >> 2016-09-30 15:05:06.894995 7f2b35c8c700 0 -- 10.0.1.15:6810/8033 >> >> 10.0.1.16:6800/1679 pipe(0x7f2b9fc50800 sd=146 :6810 s=0 pgs=0 cs=0 l=0 c=0x7f2b9eaf1800).accept connect_seq 2 vs existing 1 state >> open >> 2016-09-30 15:05:06.895558 7f2b39fcf700 0 -- 10.0.1.15:6810/8033 >> >> 10.0.1.16:6822/13278 pipe(0x7f2b9f199400 sd=207 :59416 s=2 pgs=47 cs=1 >> l=0 c=0x7f2b9f247d80).fault, initiating reconnect >> 2016-09-30 15:05:06.895618 7f2b3a5d5700 0 -- 10.0.1.15:6810/8033 >> >> 10.0.1.16:6822/13278 pipe(0x7f2b9f199400 sd=207 :59416 s=1 pgs=47 cs=2 >> l=0 c=0x7f2b9f247d80).fault > > Not sure how much help I can provide, but are you sure all your networking is working 100% between all your OSd nodes? Can you see anything in the log of this 10.0.1.16 node that it's trying to connect to? > > >> >> MDS: >> >> 2016-09-30 14:53:05.112007 7f150e599180 0 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-mds, >> pid 1092 >> 2016-09-30 14:53:05.113631 7f150e599180 0 pidfile_write: ignore empty --pid-file >> 2016-09-30 14:53:06.455957 7f1508574700 1 mds.cephmon1 handle_mds_map standby >> 2016-09-30 14:53:06.467568 7f1508574700 1 mds.0.114 handle_mds_map i am now mds.0.114 >> 2016-09-30 14:53:06.467575 7f1508574700 1 mds.0.114 handle_mds_map state change up:boot --> up:replay >> 2016-09-30 14:53:06.467591 7f1508574700 1 mds.0.114 replay_start >> 2016-09-30 14:53:06.467683 7f1508574700 1 mds.0.114 recovery set is >> >> >> >> I already restarted ceph. >> >> Nothing helps. >> >> I have basically no idea what to do now. >> >> Any help is greatly appriciated ! >> >> Thank you ! >> >> >> -- >> Mit freundlichen Gruessen / Best regards >> >> Oliver Dzombic >> IP-Interactive >> >> mailto:info@xxxxxxxxxxxxxxxxx >> >> Anschrift: >> >> IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 >> 63571 Gelnhausen >> >> HRB 93402 beim Amtsgericht Hanau >> Geschäftsführung: Oliver Dzombic >> >> Steuer Nr.: 35 236 3622 1 >> UST ID: DE274086107 >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com