Re: "session established", "io error", "session lost, hunting for new mon" solution/fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 
Thanks Ilya for explaining. Am I correct to understand from the link[0] 
mentioned in the issue, that because eg. I have an unhealthy state for 
some time (1 pg on a insignificant pool) I have larger osdmaps, 
triggering this issue? Or is just random bad luck? (Just a bit curious 
why I have this issue)

[0]
https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg51522.html

-----Original Message-----
Subject: Re:  "session established", "io error", "session 
lost, hunting for new mon" solution/fix

On Fri, Jul 12, 2019 at 12:33 PM Paul Emmerich <paul.emmerich@xxxxxxxx> 
wrote:
>
>
>
> On Thu, Jul 11, 2019 at 11:36 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> 
wrote:
>> Anyone know why I would get these? Is it not strange to get them in a 

>> 'standard' setup?
>
> you are probably running on an ancient kernel. this bug has been fixed 
a long time ago.

This is not a kernel bug:

http://tracker.ceph.com/issues/38040

It is possible to hit with few OSDs too.  The actual problem is the size 
of the osdmap message which can contain multiple full osdmaps, not the 
number of OSDs.  The size of a full osdmap is proportional to the number 
of OSDs but it's not the only way to get a big osdmap message.

As you have experienced, these settings used to be expressed in the 
number of osdmaps and our defaults were too high for a stream of full 
osdmaps (as opposed to incrementals).  It is now expressed in bytes, the 
patch should be in 12.2.13.

>
>> -----Original Message-----
>> Subject:  "session established", "io error", "session 
>> lost, hunting for new mon" solution/fix
>>
>>
>> I have on a cephfs client again (luminous cluster, centos7, only 32 
>> osds!). Wanted to share the 'fix'
>>
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session 
>> established [Thu Jul 11 12:16:09 2019] libceph: mon0 
>> 192.168.10.111:6789 io error [Thu Jul 11 12:16:09 2019] libceph: mon0 

>> 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 
>> 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session established 
>> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 io error 

>> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session 
>> lost, hunting for new mon [Thu Jul 11 12:16:09 2019] libceph: mon0 
>> 192.168.10.111:6789 session established [Thu Jul 11 12:16:09 2019] 
>> libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 12:16:09 2019] 

>> libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon 
>> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session 
>> established [Thu Jul 11 12:16:09 2019] libceph: mon1 
>> 192.168.10.112:6789 io error [Thu Jul 11 12:16:09 2019] libceph: mon1 

>> 192.168.10.112:6789 session lost, hunting for new mon
>>
>> 1) I blocked client access to the monitors with iptables -I INPUT -p 
>> tcp -s 192.168.10.43 --dport 6789 -j REJECT Resulting in
>>
>> [Thu Jul 11 12:34:16 2019] libceph: mon1 192.168.10.112:6789 socket 
>> closed (con state CONNECTING) [Thu Jul 11 12:34:18 2019] libceph: 
>> mon1 192.168.10.112:6789 socket closed (con state CONNECTING) [Thu 
>> Jul 11 12:34:22 2019] libceph: mon1 192.168.10.112:6789 socket closed 

>> (con state CONNECTING) [Thu Jul 11 12:34:26 2019] libceph: mon2 
>> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 
>> 12:34:27 2019] libceph: mon2 192.168.10.113:6789 socket closed (con 
>> state CONNECTING) [Thu Jul 11 12:34:28 2019] libceph: mon2 
>> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 
>> 12:34:30 2019] libceph: mon1 192.168.10.112:6789 socket closed (con 
>> state CONNECTING) [Thu Jul 11 12:34:30 2019] libceph: mon2 
>> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 
>> 12:34:34 2019] libceph: mon2 192.168.10.113:6789 socket closed (con 
>> state CONNECTING) [Thu Jul 11 12:34:42 2019] libceph: mon2 
>> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 
>> 12:34:44 2019] libceph: mon0 192.168.10.111:6789 socket closed (con 
>> state CONNECTING) [Thu Jul 11 12:34:45 2019] libceph: mon0 
>> 192.168.10.111:6789 socket closed (con state CONNECTING) [Thu Jul 11 
>> 12:34:46 2019] libceph: mon0 192.168.10.111:6789 socket closed (con 
>> state CONNECTING)
>>
>> 2) I applied the suggested changes to the osd map message max, 
>> mentioned
>>
>> in early threads[0]
>> ceph tell osd.* injectargs '--osd_map_message_max=10'
>> ceph tell mon.* injectargs '--osd_map_message_max=10'
>> [@c01 ~]# ceph daemon osd.0 config show|grep message_max
>>     "osd_map_message_max": "10",
>> [@c01 ~]# ceph daemon mon.a config show|grep message_max
>>     "osd_map_message_max": "10",
>>
>> [0]
>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg54419.html
>> http://tracker.ceph.com/issues/38040
>>
>> 3) Allow access to a monitor with
>> iptables -D INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT
>>
>> Getting
>> [Thu Jul 11 12:39:26 2019] libceph: mon0 192.168.10.111:6789 session 
>> established [Thu Jul 11 12:39:26 2019] libceph: osd0 down [Thu Jul 11 

>> 12:39:26 2019] libceph: osd0 up
>>
>> Problems solved, in D state hung unmount was released.
>>
>> I am not sure if the prolonged disconnection to the monitors was the 
>> solution or the osd_map_message_max=10, or both.

Thanks,

                Ilya


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux