On Fri, Jul 12, 2019 at 12:33 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote: > > > > On Thu, Jul 11, 2019 at 11:36 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote: >> Anyone know why I would get these? Is it not strange to get them in a >> 'standard' setup? > > you are probably running on an ancient kernel. this bug has been fixed a long time ago. This is not a kernel bug: http://tracker.ceph.com/issues/38040 It is possible to hit with few OSDs too. The actual problem is the size of the osdmap message which can contain multiple full osdmaps, not the number of OSDs. The size of a full osdmap is proportional to the number of OSDs but it's not the only way to get a big osdmap message. As you have experienced, these settings used to be expressed in the number of osdmaps and our defaults were too high for a stream of full osdmaps (as opposed to incrementals). It is now expressed in bytes, the patch should be in 12.2.13. > > Paul > >> -----Original Message----- >> Subject: "session established", "io error", "session lost, >> hunting for new mon" solution/fix >> >> >> I have on a cephfs client again (luminous cluster, centos7, only 32 >> osds!). Wanted to share the 'fix' >> >> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session >> established >> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io error >> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session >> lost, hunting for new mon >> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session >> established >> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 io error >> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session >> lost, hunting for new mon >> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session >> established >> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io error >> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session >> lost, hunting for new mon >> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session >> established >> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 io error >> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session >> lost, hunting for new mon >> >> 1) I blocked client access to the monitors with >> iptables -I INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT >> Resulting in >> >> [Thu Jul 11 12:34:16 2019] libceph: mon1 192.168.10.112:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:18 2019] libceph: mon1 192.168.10.112:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:22 2019] libceph: mon1 192.168.10.112:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:26 2019] libceph: mon2 192.168.10.113:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:27 2019] libceph: mon2 192.168.10.113:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:28 2019] libceph: mon2 192.168.10.113:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:30 2019] libceph: mon1 192.168.10.112:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:30 2019] libceph: mon2 192.168.10.113:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:34 2019] libceph: mon2 192.168.10.113:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:42 2019] libceph: mon2 192.168.10.113:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:44 2019] libceph: mon0 192.168.10.111:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:45 2019] libceph: mon0 192.168.10.111:6789 socket >> closed (con state CONNECTING) >> [Thu Jul 11 12:34:46 2019] libceph: mon0 192.168.10.111:6789 socket >> closed (con state CONNECTING) >> >> 2) I applied the suggested changes to the osd map message max, mentioned >> >> in early threads[0] >> ceph tell osd.* injectargs '--osd_map_message_max=10' >> ceph tell mon.* injectargs '--osd_map_message_max=10' >> [@c01 ~]# ceph daemon osd.0 config show|grep message_max >> "osd_map_message_max": "10", >> [@c01 ~]# ceph daemon mon.a config show|grep message_max >> "osd_map_message_max": "10", >> >> [0] >> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg54419.html >> http://tracker.ceph.com/issues/38040 >> >> 3) Allow access to a monitor with >> iptables -D INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT >> >> Getting >> [Thu Jul 11 12:39:26 2019] libceph: mon0 192.168.10.111:6789 session >> established >> [Thu Jul 11 12:39:26 2019] libceph: osd0 down >> [Thu Jul 11 12:39:26 2019] libceph: osd0 up >> >> Problems solved, in D state hung unmount was released. >> >> I am not sure if the prolonged disconnection to the monitors was the >> solution or the osd_map_message_max=10, or both. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com