Re: failed lossy con, dropping message

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 13 Apr 2017 10:21:35 -0700

On Thu, Apr 13, 2017 at 9:27 AM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote:
> Hello Greg,
>
> Thank you for the clarification. One last thing: can you point me to some
> documents that describes these? I would like to better understand what's
> going on behind the curtains ...

Unfortunately I don't think anything like that really exists outside
of developer discussions and the source code itself.

>
> Kind regards,
> Laszlo
>
> On 13.04.2017 16:22, Gregory Farnum wrote:
>>
>>
>> On Thu, Apr 13, 2017 at 2:17 AM Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx
>> <mailto:laszlo@xxxxxxxxxxxxxxxx>> wrote:
>>
>>     Hello Greg,
>>
>>     Thank you for the answer.
>>     I'm still in doubt with the "lossy". What does it mean in this
>> context? I can think of different variants:
>>     1. The designer of the protocol from start is considering the
>> connection to be "lossy" so the connection errors are handled in a higher
>> layer. So the layer that has observed the failure of the connection is just
>> logging this event and will let the upper layer to handle it. This would
>> support your statement 'since it's a "lossy" connection we don't need to
>> remember the message and resend it.'
>>
>>
>> This one. :)
>> The messenger subsystem can be configured as lossy or non-lossy; all the
>> RADOS connecrions are lossy since a failure frequently means we'll have for
>> etargwt the operation anyway (to a different OSD). CephFS uses the state
>> full connections a bit more.
>> -Greg
>>
>>
>>
>>
>>     2. A connection is not declared "lossy" as long as it is working
>> properly. Once it ha lost some packets or some error threshold is reached,
>> we declare the connection as being lossy, inform the higher layer, and let
>> it decide what next. Compared with point 1. the actions are quite similar,
>> but the usage of the "lossy" is different. At point 1. a connection is
>> always "lossy" even if it is not losing any packet actually. In the second
>> case the connection will became "lossy" when the errors will appear, so
>> "lossy" is a runtime state of the connection.
>>
>>     Maybe both are wrong and the truth is a third variant ... :) This is
>> what I would like to understand.
>>
>>     Kind regards,
>>     Laszlo
>>
>>
>>     On 13.04.2017 00:36, Gregory Farnum wrote:
>>     > On Wed, Apr 12, 2017 at 3:00 AM, Laszlo Budai
>> <laszlo@xxxxxxxxxxxxxxxx <mailto:laszlo@xxxxxxxxxxxxxxxx>> wrote:
>>     >> Hello,
>>     >>
>>     >> yesterday one of our compute nodes has recorded the following
>> message for
>>     >> one of the ceph connections:
>>     >>
>>     >> submit_message osd_op(client.28817736.0:690186
>>     >> rbd_data.15c046b11ab57b7.00000000000000c4 [read 2097152~380928]
>> 3.6f81364a
>>     >> ack+read+known_if_redirected e3617) v5 remote,
>> 10.12.68.71:6818/6623 <http://10.12.68.71:6818/6623>, failed
>>     >> lossy con, dropping message
>>     >
>>     > A read message, sent to the OSD at IP 10.12.68.71:6818/6623
>> <http://10.12.68.71:6818/6623>, is being
>>     > dropped because the connection has somehow failed; since it's a
>>     > "lossy" connection we don't need to remember the message and resend
>>     > it. That failure could be an actual TCP/IP stack error; it could be
>>     > because a different thread killed the connection and it's now
>> closed.
>>     >
>>     > If you've just got one of these and didn't see other problems, it's
>>     > innocuous — I expect the most common cause for this is an OSD
>> getting
>>     > marked down while IO is pending to it. :)
>>     > -Greg
>>     >
>>     >>
>>     >> Can someone "decode" the above message, or direct me to some
>> document where
>>     >> I could read more about it?
>>     >>
>>     >> We have ceph 0.94.10.
>>     >>
>>     >> Thank you,
>>     >> Laszlo
>>     >> _______________________________________________
>>     >> ceph-users mailing list
>>     >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>>     >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     >
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com