Thanks Ilya for explaining. Am I correct to understand from the link[0] mentioned in the issue, that because eg. I have an unhealthy state for some time (1 pg on a insignificant pool) I have larger osdmaps, triggering this issue? Or is just random bad luck? (Just a bit curious why I have this issue) [0] https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg51522.html -----Original Message----- Subject: Re: "session established", "io error", "session lost, hunting for new mon" solution/fix On Fri, Jul 12, 2019 at 12:33 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote: > > > > On Thu, Jul 11, 2019 at 11:36 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote: >> Anyone know why I would get these? Is it not strange to get them in a >> 'standard' setup? > > you are probably running on an ancient kernel. this bug has been fixed a long time ago. This is not a kernel bug: http://tracker.ceph.com/issues/38040 It is possible to hit with few OSDs too. The actual problem is the size of the osdmap message which can contain multiple full osdmaps, not the number of OSDs. The size of a full osdmap is proportional to the number of OSDs but it's not the only way to get a big osdmap message. As you have experienced, these settings used to be expressed in the number of osdmaps and our defaults were too high for a stream of full osdmaps (as opposed to incrementals). It is now expressed in bytes, the patch should be in 12.2.13. > >> -----Original Message----- >> Subject: "session established", "io error", "session >> lost, hunting for new mon" solution/fix >> >> >> I have on a cephfs client again (luminous cluster, centos7, only 32 >> osds!). Wanted to share the 'fix' >> >> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session >> established [Thu Jul 11 12:16:09 2019] libceph: mon0 >> 192.168.10.111:6789 io error [Thu Jul 11 12:16:09 2019] libceph: mon0 >> 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 >> 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session established >> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 io error >> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session >> lost, hunting for new mon [Thu Jul 11 12:16:09 2019] libceph: mon0 >> 192.168.10.111:6789 session established [Thu Jul 11 12:16:09 2019] >> libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 12:16:09 2019] >> libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon >> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session >> established [Thu Jul 11 12:16:09 2019] libceph: mon1 >> 192.168.10.112:6789 io error [Thu Jul 11 12:16:09 2019] libceph: mon1 >> 192.168.10.112:6789 session lost, hunting for new mon >> >> 1) I blocked client access to the monitors with iptables -I INPUT -p >> tcp -s 192.168.10.43 --dport 6789 -j REJECT Resulting in >> >> [Thu Jul 11 12:34:16 2019] libceph: mon1 192.168.10.112:6789 socket >> closed (con state CONNECTING) [Thu Jul 11 12:34:18 2019] libceph: >> mon1 192.168.10.112:6789 socket closed (con state CONNECTING) [Thu >> Jul 11 12:34:22 2019] libceph: mon1 192.168.10.112:6789 socket closed >> (con state CONNECTING) [Thu Jul 11 12:34:26 2019] libceph: mon2 >> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 >> 12:34:27 2019] libceph: mon2 192.168.10.113:6789 socket closed (con >> state CONNECTING) [Thu Jul 11 12:34:28 2019] libceph: mon2 >> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 >> 12:34:30 2019] libceph: mon1 192.168.10.112:6789 socket closed (con >> state CONNECTING) [Thu Jul 11 12:34:30 2019] libceph: mon2 >> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 >> 12:34:34 2019] libceph: mon2 192.168.10.113:6789 socket closed (con >> state CONNECTING) [Thu Jul 11 12:34:42 2019] libceph: mon2 >> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 >> 12:34:44 2019] libceph: mon0 192.168.10.111:6789 socket closed (con >> state CONNECTING) [Thu Jul 11 12:34:45 2019] libceph: mon0 >> 192.168.10.111:6789 socket closed (con state CONNECTING) [Thu Jul 11 >> 12:34:46 2019] libceph: mon0 192.168.10.111:6789 socket closed (con >> state CONNECTING) >> >> 2) I applied the suggested changes to the osd map message max, >> mentioned >> >> in early threads[0] >> ceph tell osd.* injectargs '--osd_map_message_max=10' >> ceph tell mon.* injectargs '--osd_map_message_max=10' >> [@c01 ~]# ceph daemon osd.0 config show|grep message_max >> "osd_map_message_max": "10", >> [@c01 ~]# ceph daemon mon.a config show|grep message_max >> "osd_map_message_max": "10", >> >> [0] >> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg54419.html >> http://tracker.ceph.com/issues/38040 >> >> 3) Allow access to a monitor with >> iptables -D INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT >> >> Getting >> [Thu Jul 11 12:39:26 2019] libceph: mon0 192.168.10.111:6789 session >> established [Thu Jul 11 12:39:26 2019] libceph: osd0 down [Thu Jul 11 >> 12:39:26 2019] libceph: osd0 up >> >> Problems solved, in D state hung unmount was released. >> >> I am not sure if the prolonged disconnection to the monitors was the >> solution or the osd_map_message_max=10, or both. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com