Several OSD's Crashed : unable to bind to any port in range 6800-7300: (98) Address already in use

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Community 

Need Help with my production Ceph cluster were multiple OSDs are getting crashed after throwing this error


2015-08-11 16:01:19.617860 7f3d95219700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use
2015-08-11 16:01:19.618929 7f3d95219700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use

I am seeing this problem second time in last 4 days , earlier i restart OSD services and they worked initially. But today again OSD’s broke.

Here is the backtrack


  -10> 2015-08-10 12:38:02.766359 7faa0abce700 -1 osd.60 39761 heartbeat_check: no reply from osd.33 ever on either front or back, first ping sent 2015-08-10 12:37:00.655566 (cutoff 2015-08-10 12:37:42.766354)
    -9> 2015-08-10 12:38:02.766423 7faa0abce700 -1 osd.60 39761 heartbeat_check: no reply from osd.50 ever on either front or back, first ping sent 2015-08-10 12:37:00.655566 (cutoff 2015-08-10 12:37:42.766354)
    -8> 2015-08-10 12:38:02.766433 7faa0abce700 -1 osd.60 39761 heartbeat_check: no reply from osd.134 ever on either front or back, first ping sent 2015-08-10 12:37:23.469422 (cutoff 2015-08-10 12:37:42.766354)
    -7> 2015-08-10 12:38:02.766446 7faa0abce700 -1 osd.60 39761 heartbeat_check: no reply from osd.200 ever on either front or back, first ping sent 2015-08-10 12:37:15.361731 (cutoff 2015-08-10 12:37:42.766354)
    -6> 2015-08-10 12:38:02.766454 7faa0abce700 -1 osd.60 39761 heartbeat_check: no reply from osd.228 ever on either front or back, first ping sent 2015-08-10 12:37:00.655566 (cutoff 2015-08-10 12:37:42.766354)
    -5> 2015-08-10 12:38:03.259647 7fa9b5b9a700  0 -- 10.100.50.2:0/82807 >> 10.100.50.4:7142/147030592 pipe(0x4ff3200 sd=399 :0 s=1 pgs=0 cs=0 l=1 c=0x44b3de0).fault
    -4> 2015-08-10 12:38:03.259682 7fa9b5594700  0 -- 10.100.50.2:0/82807 >> 10.100.50.1:7204/408026440 pipe(0xf278f00 sd=411 :0 s=1 pgs=0 cs=0 l=1 c=0x44b7bc0).fault
    -3> 2015-08-10 12:38:03.271675 7fa9ecda2700  0 log [WRN] : map e39763 wrongly marked me down
    -2> 2015-08-10 12:38:03.306073 7fa9ecda2700 -1 accepter.accepter.bind unable to bind to 10.100.50.2:7300 on any port in range 6800-7300: (98) Address already in use
    -1> 2015-08-10 12:38:03.368817 7fa9ecda2700  0 osd.60 39763 prepare_to_stop starting shutdown
     0> 2015-08-10 12:38:03.372071 7fa9ecda2700 -1 common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fa9ecda2700 time 2015-08-10 12:38:03.368886
common/Mutex.cc: 93: FAILED assert(r == 0)

 ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
 1: (Mutex::Lock(bool)+0x1d3) [0xa83003]
 2: (OSD::shutdown()+0x63) [0x63f3f3]
 3: (OSD::handle_osd_map(MOSDMap*)+0x1829) [0x64dff9]
 4: (OSD::_dispatch(Message*)+0x2fb) [0x6600eb]
 5: (OSD::ms_dispatch(Message*)+0x211) [0x6607b1]
 6: (DispatchQueue::entry()+0x5a2) [0xb5ac12]
 7: (DispatchQueue::DispatchThread::entry()+0xd) [0xaf23ad]
 8: /lib64/libpthread.so.0() [0x35952079d1]
 9: (clone()+0x6d) [0x3594ee89dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


My Environment 

ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
Kernel : 2.6.32-431.el6.x86_64
CentOS release 6.5 (Final)
I have 4 OSD nodes but just 2 of them has shown this error

I have reported this under http://tracker.ceph.com/issues/12655



****************************************************************
Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/
****************************************************************


Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux