On Thu, Apr 13, 2017 at 9:27 AM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote: > Hello Greg, > > Thank you for the clarification. One last thing: can you point me to some > documents that describes these? I would like to better understand what's > going on behind the curtains ... Unfortunately I don't think anything like that really exists outside of developer discussions and the source code itself. > > Kind regards, > Laszlo > > On 13.04.2017 16:22, Gregory Farnum wrote: >> >> >> On Thu, Apr 13, 2017 at 2:17 AM Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx >> <mailto:laszlo@xxxxxxxxxxxxxxxx>> wrote: >> >> Hello Greg, >> >> Thank you for the answer. >> I'm still in doubt with the "lossy". What does it mean in this >> context? I can think of different variants: >> 1. The designer of the protocol from start is considering the >> connection to be "lossy" so the connection errors are handled in a higher >> layer. So the layer that has observed the failure of the connection is just >> logging this event and will let the upper layer to handle it. This would >> support your statement 'since it's a "lossy" connection we don't need to >> remember the message and resend it.' >> >> >> This one. :) >> The messenger subsystem can be configured as lossy or non-lossy; all the >> RADOS connecrions are lossy since a failure frequently means we'll have for >> etargwt the operation anyway (to a different OSD). CephFS uses the state >> full connections a bit more. >> -Greg >> >> >> >> >> 2. A connection is not declared "lossy" as long as it is working >> properly. Once it ha lost some packets or some error threshold is reached, >> we declare the connection as being lossy, inform the higher layer, and let >> it decide what next. Compared with point 1. the actions are quite similar, >> but the usage of the "lossy" is different. At point 1. a connection is >> always "lossy" even if it is not losing any packet actually. In the second >> case the connection will became "lossy" when the errors will appear, so >> "lossy" is a runtime state of the connection. >> >> Maybe both are wrong and the truth is a third variant ... :) This is >> what I would like to understand. >> >> Kind regards, >> Laszlo >> >> >> On 13.04.2017 00:36, Gregory Farnum wrote: >> > On Wed, Apr 12, 2017 at 3:00 AM, Laszlo Budai >> <laszlo@xxxxxxxxxxxxxxxx <mailto:laszlo@xxxxxxxxxxxxxxxx>> wrote: >> >> Hello, >> >> >> >> yesterday one of our compute nodes has recorded the following >> message for >> >> one of the ceph connections: >> >> >> >> submit_message osd_op(client.28817736.0:690186 >> >> rbd_data.15c046b11ab57b7.00000000000000c4 [read 2097152~380928] >> 3.6f81364a >> >> ack+read+known_if_redirected e3617) v5 remote, >> 10.12.68.71:6818/6623 <http://10.12.68.71:6818/6623>, failed >> >> lossy con, dropping message >> > >> > A read message, sent to the OSD at IP 10.12.68.71:6818/6623 >> <http://10.12.68.71:6818/6623>, is being >> > dropped because the connection has somehow failed; since it's a >> > "lossy" connection we don't need to remember the message and resend >> > it. That failure could be an actual TCP/IP stack error; it could be >> > because a different thread killed the connection and it's now >> closed. >> > >> > If you've just got one of these and didn't see other problems, it's >> > innocuous — I expect the most common cause for this is an OSD >> getting >> > marked down while IO is pending to it. :) >> > -Greg >> > >> >> >> >> Can someone "decode" the above message, or direct me to some >> document where >> >> I could read more about it? >> >> >> >> We have ceph 0.94.10. >> >> >> >> Thank you, >> >> Laszlo >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com