Re: Help! 61.1 killed my monitors in prod

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Joao,

Thanks for you response.  Sorry for the marginal quality of the original e-mail...... 

Better log information in-line.

On May 13, 2013, at 1:19 PM, Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> wrote:

> On 05/13/2013 08:40 PM, Stephen Street wrote:
>> 
>> On May 10, 2013, at 3:39 PM, Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> wrote:
>> 
>>> We would certainly be interested in taking a look at logs from
>>> those  monitors, and would appreciate if you could set 'debug mon = 20', 'debug
>>> auth = 10' and 'debug ms = 1', and give them a spin until you hit your
>>> issue.
>> 
>> I seeing the same problem at Jeppesen.  I running 0.61.1 with 3 MON,
>> 4 OSD and 1 MDS and a reboot of the cluster falls in the same state
>> with hung ceph-create-keys and the monitors not running.  I add the
>> debug setting as indicated.  This is a excerpt from of the output of
>> "ceph status
> 
> All this shows is that connections from 'ceph' to the monitors are being dropped/closed.
> 
> Assessing what's going on will require logs from the monitors with the same debug levels as stated before.
> 

>From the logs, it appears that the monitors are struggling to bind to the network at system start. If I issue a initctl restart ceph-mon-all to all nodes running monitors, the system starts functioning correctly.

My ceph.conf generated by ceph-deploy (I added the debug):

[global]
fsid = f3aaf545-515c-4597-a0c2-2b08a309e944
mon_initial_members = cloud-2, cloud-3, cloud-4
mon_host = 192.168.139.2,192.168.139.3,192.168.139.4
auth_supported = cephx
osd_journal_size = 2048
filestore_xattr_use_omap = true
debug_mon = 20
debug_auth = 10
debug_ms = 1

First MON:

ps aux | grep ceph
root      1295  0.0  0.0  15584  1328 ?        S    22:11   0:00 initctl emit ceph-osd cluster=ceph id=1
root      1304  0.0  0.0  33676  7108 ?        Ss   22:11   0:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i cloud-2
root      1319  0.0  0.0 313364  4152 ?        Sl   22:11   0:00 ceph --cluster=ceph --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush create-or-move -- 1 0.68 root=default host=cloud-2
root      2766  0.0  0.0   9440   956 pts/0    S+   22:13   0:00 grep --color=auto ceph

root@cloud-2:~# cat /var/log/ceph/ceph-mon.cloud-2.log 
2013-05-13 22:11:16.776048 7f18ddca87c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1302
2013-05-13 22:11:16.783543 7f18ddca87c0 10 needs_conversion
2013-05-13 22:11:16.805741 7f18d9a9d700 -1 asok(0x2cf80e0) AdminSocket: request 'mon_status' not defined
2013-05-13 22:11:16.814564 7f18ddca87c0 10 obtain_monmap
2013-05-13 22:11:16.814605 7f18ddca87c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.814683 7f18ddca87c0 -1 accepter.accepter.bind unable to bind to 192.168.139.2:6789: Cannot assign requested address
2013-05-13 22:11:16.821721 7fc9c91757c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1338
2013-05-13 22:11:16.822949 7fc9c91757c0 10 needs_conversion
2013-05-13 22:11:16.827517 7fc9c91757c0 10 obtain_monmap
2013-05-13 22:11:16.827563 7fc9c91757c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.827636 7fc9c91757c0 -1 accepter.accepter.bind unable to bind to 192.168.139.2:6789: Cannot assign requested address
2013-05-13 22:11:16.834545 7fb0c3fc27c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1347
2013-05-13 22:11:16.835990 7fb0c3fc27c0 10 needs_conversion
2013-05-13 22:11:16.841349 7fb0c3fc27c0 10 obtain_monmap
2013-05-13 22:11:16.841388 7fb0c3fc27c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.841470 7fb0c3fc27c0 -1 accepter.accepter.bind unable to bind to 192.168.139.2:6789: Cannot assign requested address
2013-05-13 22:11:16.848379 7f593c5037c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1354
2013-05-13 22:11:16.849630 7f593c5037c0 10 needs_conversion
2013-05-13 22:11:16.854367 7f593c5037c0 10 obtain_monmap
2013-05-13 22:11:16.854400 7f593c5037c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.854471 7f593c5037c0 -1 accepter.accepter.bind unable to bind to 192.168.139.2:6789: Cannot assign requested address
2013-05-13 22:11:16.861371 7fe46afba7c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1362
2013-05-13 22:11:16.862730 7fe46afba7c0 10 needs_conversion
2013-05-13 22:11:16.867683 7fe46afba7c0 10 obtain_monmap
2013-05-13 22:11:16.867722 7fe46afba7c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.867804 7fe46afba7c0 -1 accepter.accepter.bind unable to bind to 192.168.139.2:6789: Cannot assign requested address
2013-05-13 22:11:16.874695 7faadb1b57c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1369
2013-05-13 22:11:16.875922 7faadb1b57c0 10 needs_conversion
2013-05-13 22:11:16.880680 7faadb1b57c0 10 obtain_monmap
2013-05-13 22:11:16.880718 7faadb1b57c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.880797 7faadb1b57c0 -1 accepter.accepter.bind unable to bind to 192.168.139.2:6789: Cannot assign requested address

Second MON:

ps aux | grep ceph
root      1288  0.0  0.0  15584  1332 ?        S    22:11   0:00 initctl emit ceph-osd cluster=ceph id=2
root      1293  0.0  0.0  33676  7108 ?        Ss   22:11   0:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i cloud-3
root      1313  0.0  0.0 313364  4156 ?        Sl   22:11   0:00 ceph --cluster=ceph --name=osd.2 --keyring=/var/lib/ceph/osd/ceph-2/keyring osd crush create-or-move -- 2 0.68 root=default host=cloud-3
root      3713  0.0  0.0   9440   956 pts/1    S+   22:14   0:00 grep --color=auto ceph

root@cloud-3:~# cat /var/log/ceph/ceph-mon.cloud-3.log 
2013-05-13 22:11:22.957239 7f5c8ad1e7c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1291
2013-05-13 22:11:22.958858 7f5c8ad1e7c0 10 needs_conversion
2013-05-13 22:11:22.971652 7f5c86b13700 -1 asok(0x2a980e0) AdminSocket: request 'mon_status' not defined
2013-05-13 22:11:22.976992 7f5c8ad1e7c0 10 obtain_monmap
2013-05-13 22:11:22.977033 7f5c8ad1e7c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:22.977110 7f5c8ad1e7c0 -1 accepter.accepter.bind unable to bind to 192.168.139.3:6789: Cannot assign requested address
2013-05-13 22:11:22.984116 7f48e9aac7c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1333
2013-05-13 22:11:22.985484 7f48e9aac7c0 10 needs_conversion
2013-05-13 22:11:22.990015 7f48e9aac7c0 10 obtain_monmap
2013-05-13 22:11:22.990069 7f48e9aac7c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:22.990152 7f48e9aac7c0 -1 accepter.accepter.bind unable to bind to 192.168.139.3:6789: Cannot assign requested address
2013-05-13 22:11:22.996957 7f8b3ac437c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1340
2013-05-13 22:11:22.998339 7f8b3ac437c0 10 needs_conversion
2013-05-13 22:11:23.004206 7f8b3ac437c0 10 obtain_monmap
2013-05-13 22:11:23.004245 7f8b3ac437c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:23.004325 7f8b3ac437c0 -1 accepter.accepter.bind unable to bind to 192.168.139.3:6789: Cannot assign requested address
2013-05-13 22:11:23.011281 7fcc636f97c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1347
2013-05-13 22:11:23.012754 7fcc636f97c0 10 needs_conversion
2013-05-13 22:11:23.017546 7fcc636f97c0 10 obtain_monmap
2013-05-13 22:11:23.017586 7fcc636f97c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:23.017667 7fcc636f97c0 -1 accepter.accepter.bind unable to bind to 192.168.139.3:6789: Cannot assign requested address
2013-05-13 22:11:23.024528 7fc3823a37c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1354
2013-05-13 22:11:23.025990 7fc3823a37c0 10 needs_conversion
2013-05-13 22:11:23.030885 7fc3823a37c0 10 obtain_monmap
2013-05-13 22:11:23.030925 7fc3823a37c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:23.030997 7fc3823a37c0 -1 accepter.accepter.bind unable to bind to 192.168.139.3:6789: Cannot assign requested address
2013-05-13 22:11:23.037641 7fca36aeb7c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1361
2013-05-13 22:11:23.039020 7fca36aeb7c0 10 needs_conversion
2013-05-13 22:11:23.044057 7fca36aeb7c0 10 obtain_monmap
2013-05-13 22:11:23.044097 7fca36aeb7c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:23.044174 7fca36aeb7c0 -1 accepter.accepter.bind unable to bind to 192.168.139.3:6789: Cannot assign requested address

Third MON:

ps aux | grep ceph
root      1261  0.0  0.0  15584  1328 ?        S    22:11   0:00 initctl emit ceph-osd cluster=ceph id=3
root      1265  0.0  0.0  33676  7108 ?        Ss   22:11   0:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i cloud-4
root      1285  0.0  0.0 313364  4156 ?        Sl   22:11   0:00 ceph --cluster=ceph --name=osd.3 --keyring=/var/lib/ceph/osd/ceph-3/keyring osd crush create-or-move -- 3 0.68 root=default host=cloud-4
root      4293  0.0  0.0   9440   956 pts/0    S+   22:15   0:00 grep --color=auto ceph

root@cloud-4:~# cat /var/log/ceph/ceph-mon.cloud-4.log 
2013-05-13 22:11:16.789076 7f83e6a897c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1263
2013-05-13 22:11:16.790379 7f83e6a897c0 10 needs_conversion
2013-05-13 22:11:16.801884 7f83e287e700 -1 asok(0x16ea0e0) AdminSocket: request 'mon_status' not defined
2013-05-13 22:11:16.809678 7f83e6a897c0 10 obtain_monmap
2013-05-13 22:11:16.809722 7f83e6a897c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.809817 7f83e6a897c0 -1 accepter.accepter.bind unable to bind to 192.168.139.4:6789: Cannot assign requested address
2013-05-13 22:11:16.816827 7f515f2da7c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1303
2013-05-13 22:11:16.818084 7f515f2da7c0 10 needs_conversion
2013-05-13 22:11:16.822703 7f515f2da7c0 10 obtain_monmap
2013-05-13 22:11:16.822745 7f515f2da7c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.822824 7f515f2da7c0 -1 accepter.accepter.bind unable to bind to 192.168.139.4:6789: Cannot assign requested address
2013-05-13 22:11:16.829328 7fadc45ed7c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1312
2013-05-13 22:11:16.830575 7fadc45ed7c0 10 needs_conversion
2013-05-13 22:11:16.835661 7fadc45ed7c0 10 obtain_monmap
2013-05-13 22:11:16.835720 7fadc45ed7c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.835806 7fadc45ed7c0 -1 accepter.accepter.bind unable to bind to 192.168.139.4:6789: Cannot assign requested address
2013-05-13 22:11:16.842621 7ff373aea7c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1319
2013-05-13 22:11:16.843859 7ff373aea7c0 10 needs_conversion
2013-05-13 22:11:16.848741 7ff373aea7c0 10 obtain_monmap
2013-05-13 22:11:16.848784 7ff373aea7c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.848866 7ff373aea7c0 -1 accepter.accepter.bind unable to bind to 192.168.139.4:6789: Cannot assign requested address
2013-05-13 22:11:16.855608 7fd934e937c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1326
2013-05-13 22:11:16.856987 7fd934e937c0 10 needs_conversion
2013-05-13 22:11:16.862117 7fd934e937c0 10 obtain_monmap
2013-05-13 22:11:16.862151 7fd934e937c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.862219 7fd934e937c0 -1 accepter.accepter.bind unable to bind to 192.168.139.4:6789: Cannot assign requested address
2013-05-13 22:11:16.869042 7f0cc0f5f7c0  0 ceph version 0.61.1 (56c4847ba82a92023700e2d4920b59cdaf23428d), process ceph-mon, pid 1333
2013-05-13 22:11:16.870404 7f0cc0f5f7c0 10 needs_conversion
2013-05-13 22:11:16.878106 7f0cc0f5f7c0 10 obtain_monmap
2013-05-13 22:11:16.878143 7f0cc0f5f7c0 10 obtain_monmap read last committed monmap ver 1
2013-05-13 22:11:16.878212 7f0cc0f5f7c0 -1 accepter.accepter.bind unable to bind to 192.168.139.4:6789: Cannot assign requested address

What additional information do you need?

Thanks
Stephen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux