Re: "session established", "io error", "session lost, hunting for new mon" solution/fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 12, 2019 at 12:33 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
>
>
>
> On Thu, Jul 11, 2019 at 11:36 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:
>> Anyone know why I would get these? Is it not strange to get them in a
>> 'standard' setup?
>
> you are probably running on an ancient kernel. this bug has been fixed a long time ago.

This is not a kernel bug:

http://tracker.ceph.com/issues/38040

It is possible to hit with few OSDs too.  The actual problem is the
size of the osdmap message which can contain multiple full osdmaps, not
the number of OSDs.  The size of a full osdmap is proportional to the
number of OSDs but it's not the only way to get a big osdmap message.

As you have experienced, these settings used to be expressed in the
number of osdmaps and our defaults were too high for a stream of full
osdmaps (as opposed to incrementals).  It is now expressed in bytes,
the patch should be in 12.2.13.

>
> Paul
>
>> -----Original Message-----
>> Subject:  "session established", "io error", "session lost,
>> hunting for new mon" solution/fix
>>
>>
>> I have on a cephfs client again (luminous cluster, centos7, only 32
>> osds!). Wanted to share the 'fix'
>>
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
>> established
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io error
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
>> lost, hunting for new mon
>> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session
>> established
>> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 io error
>> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session
>> lost, hunting for new mon
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
>> established
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io error
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
>> lost, hunting for new mon
>> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session
>> established
>> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 io error
>> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session
>> lost, hunting for new mon
>>
>> 1) I blocked client access to the monitors with
>> iptables -I INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT
>> Resulting in
>>
>> [Thu Jul 11 12:34:16 2019] libceph: mon1 192.168.10.112:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:18 2019] libceph: mon1 192.168.10.112:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:22 2019] libceph: mon1 192.168.10.112:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:26 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:27 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:28 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:30 2019] libceph: mon1 192.168.10.112:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:30 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:34 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:42 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:44 2019] libceph: mon0 192.168.10.111:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:45 2019] libceph: mon0 192.168.10.111:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:46 2019] libceph: mon0 192.168.10.111:6789 socket
>> closed (con state CONNECTING)
>>
>> 2) I applied the suggested changes to the osd map message max, mentioned
>>
>> in early threads[0]
>> ceph tell osd.* injectargs '--osd_map_message_max=10'
>> ceph tell mon.* injectargs '--osd_map_message_max=10'
>> [@c01 ~]# ceph daemon osd.0 config show|grep message_max
>>     "osd_map_message_max": "10",
>> [@c01 ~]# ceph daemon mon.a config show|grep message_max
>>     "osd_map_message_max": "10",
>>
>> [0]
>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg54419.html
>> http://tracker.ceph.com/issues/38040
>>
>> 3) Allow access to a monitor with
>> iptables -D INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT
>>
>> Getting
>> [Thu Jul 11 12:39:26 2019] libceph: mon0 192.168.10.111:6789 session
>> established
>> [Thu Jul 11 12:39:26 2019] libceph: osd0 down
>> [Thu Jul 11 12:39:26 2019] libceph: osd0 up
>>
>> Problems solved, in D state hung unmount was released.
>>
>> I am not sure if the prolonged disconnection to the monitors was the
>> solution or the osd_map_message_max=10, or both.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux