Re: Socket errors, CRC, lossy con messages

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Mon, 17 Apr 2017 07:42:12 -0400

On Thu, Apr 13, 2017 at 4:24 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Thu, Apr 13, 2017 at 5:39 AM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>> On Wed, Apr 12, 2017 at 10:51 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>> On Wed, Apr 12, 2017 at 4:28 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>>>> Hi Ilya,
>>>>
>>>> On Wed, Apr 12, 2017 at 4:58 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>>>>
>>>>> On Tue, Apr 11, 2017 at 3:10 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>
>>>>> wrote:
>>>>> > Hi Ilya,
>>>>> >
>>>>> > On Tue, Apr 11, 2017 at 4:06 AM, Ilya Dryomov <idryomov@xxxxxxxxx>
>>>>> > wrote:
>>>>> >> On Tue, Apr 11, 2017 at 4:01 AM, Alex Gorbachev
>>>>> >> <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>>>>> >>> On Mon, Apr 10, 2017 at 2:16 PM, Alex Gorbachev
>>>>> >>> <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>>>>> >>>> I am trying to understand the cause of a problem we started
>>>>> >>>> encountering a few weeks ago.  There are 30 or so per hour messages
>>>>> >>>> on
>>>>> >>>> OSD nodes of type:
>>>>> >>>>
>>>>> >>>> ceph-osd.33.log:2017-04-10 13:42:39.935422 7fd7076d8700  0 bad crc in
>>>>> >>>> data 2227614508 != exp 2469058201
>>>>> >>>>
>>>>> >>>> and
>>>>> >>>>
>>>>> >>>> 2017-04-10 13:42:39.939284 7fd722c42700  0 -- 10.80.3.25:6826/5752
>>>>> >>>> submit_message osd_op_reply(1826606251
>>>>> >>>> rbd_data.922d95238e1f29.00000000000101bf [set-alloc-hint object_size
>>>>> >>>> 16777216 write_size 16777216,write 6328320~12288] v103574'18626765
>>>>> >>>> uv18626765 ondisk = 0) v6 remote, 10.80.3.216:0/1934733503, failed
>>>>> >>>> lossy con, dropping message 0x3b55600
>>>>> >>>>
>>>>> >>>> On a client sometimes, but not corresponding to the above:
>>>>> >>>>
>>>>> >>>> Apr 10 11:53:15 roc-5r-scd216 kernel: [4906599.023174] libceph: osd96
>>>>> >>>> 10.80.3.25:6822 socket error on write
>>>>> >>>>
>>>>> >>>> And from time to time, slow requests:
>>>>> >>>>
>>>>> >>>> 2017-04-10 13:00:04.280686 osd.91 10.80.3.45:6808/5665 231 : cluster
>>>>> >>>> [WRN] slow request 30.108325 seconds old, received at 2017-04-10
>>>>> >>>> 12:59:34.172283: osd_op(client.11893449.1:324079247
>>>>> >>>> rbd_data.8fcdfb238e1f29.00000000000187e7 [set-alloc-hint object_size
>>>>> >>>> 16777216 write_size 16777216,write 10772480~8192] 14.ed0bcdec
>>>>> >>>> ondisk+write e103545) currently waiting for subops from 2,104
>>>>> >>>> 2017-04-10 13:00:06.280949 osd.91 10.80.3.45:6808/5665 232 : cluster
>>>>> >>>> [WRN] 2 slow requests, 1 included below; oldest blocked for >
>>>>> >>>> 32.108610 secs
>>>>> >>>>
>>>>> >>>> Questions:
>>>>> >>>>
>>>>> >>>> 1. Is there any way to drill further into the "bad crc" message?
>>>>> >>>> sometimes they have nothing before or after them, but how to
>>>>> >>>> determine
>>>>> >>>> from/to what this came from - another OSD, client, which one?
>>>>> >>>>
>>>>> >>>> 2. Network seems OK - no errors on NICs, regression testing does not
>>>>> >>>> show any issues.  I realize this can be disk response, but using
>>>>> >>>> Christian Balzer's atop recommendation shows a pretty normal system.
>>>>> >>>> What is my best course of troubleshooting here - dump historic ops on
>>>>> >>>> OSD, wireshark the links or anything else?
>>>>> >>>>
>>>>> >>>> 3. Christian, if you are looking at this, what would be your red
>>>>> >>>> flags in atop?
>>>>> >>>>
>>>>> >>>
>>>>> >>> One more note: OSD nodes are running kernel 4.10.2-041002-generic and
>>>>> >>> clients - 4.4.23-040423-generic
>>>>> >>
>>>>> >> Hi Alex,
>>>>> >>
>>>>> >> Did you upgrade the kernel client from < 4.4 to 4.4 somewhere in that
>>>>> >> time
>>>>> >> frame by any chance?
>>>>> >
>>>>> > Yes, they were upgraded from 4.2.8 to 4.4.23 in October.
>>>>>
>>>>> There is a block layer bug in 4.4 and later kernels [1].  It
>>>>> effectively undoes krbd commit [2], which prevents pages from being
>>>>> further updated while in-flight.  The timeline doesn't fit though, so
>>>>> it's probably unrelated -- not every workload can trigger it...
>>>>>
>>>>> [1] http://tracker.ceph.com/issues/19275
>>>>> [2]
>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bae818ee1577c27356093901a0ea48f672eda514
>>>>
>>>>
>>>> Would the workaround be to drop to 4.3?
>>>
>>> Unfortunately, short of installing a custom kernel or disabling data
>>> CRCs entirely, yes.  I'll poke the respective maintainer again later
>>> today...
>>
>> I downgraded to 4.3.6 and so far all the CRC messages and lossy con
>> went away completely.
>
> Thanks for the confirmation.  I'll update #19275 with stable kernel
> versions when it's fixed.

Thank you.  I can definitely confirm that the lossy con and crc
mismatch messages are gone.  However, the slow requests remain, and
they are of this format:

2017-04-17 07:26:35.413710 osd.111 10.80.3.25:6818/5620 587 : cluster
[WRN] slow request 34.205118 seconds old, received at 2017-04-17
07:26:01.208492: osd_op(client.11998611.1:426520806
rbd_data.85ae77238e1f29.000000000000c27c [set-alloc-hint object_size
16777216 write_size 16777216,write 208896~4096] 26.2bd4f8b9 snapc
482=[] ondisk+write e103717) currently waiting for subops from 95,101
2017-04-17 07:26:35.413716 osd.111 10.80.3.25:6818/5620 588 : cluster
[WRN] slow request 33.851564 seconds old, received at 2017-04-17
07:26:01.562046: osd_op(client.12007895.1:105126959
rbd_data.922d95238e1f29.0000000000009656 [set-alloc-hint object_size
16777216 write_size 16777216,write 16621568~4096] 13.8fd42a47 snapc
508=[508,501,4ef] ondisk+write e103717) currently waiting for subops
from 49,213
2017-04-17 07:27:46.291096 osd.174 10.80.3.85:6830/5539 189 : cluster
[WRN] 2 slow requests, 1 included below; oldest blocked for >
36.965481 secs
2017-04-17 07:27:46.291105 osd.174 10.80.3.85:6830/5539 190 : cluster
[WRN] slow request 30.201051 seconds old, received at 2017-04-17
07:27:16.090000: osd_op(client.12007895.1:105127746
rbd_data.7a6cfe238e1f29.0000000000000001 [set-alloc-hint object_size
4194304 write_size 4194304,write 0~16384] 21.258dfd1a snapc 12=[]
ondisk+write e103717) currently waiting for subops from 111
2017-04-17 07:35:31.958179 osd.42 10.80.3.105:6816/5326 2043 : cluster
[WRN] 1 slow requests, 1 included below; oldest blocked for >
30.512702 secs
2017-04-17 07:35:31.958206 osd.42 10.80.3.105:6816/5326 2044 : cluster
[WRN] slow request 30.512702 seconds old, received at 2017-04-17
07:35:01.445405: osd_repop(client.12007895.1:105308123 13.39
13/1db63c39/rbd_data.922d95238e1f29.0000000000009652/head v
103722'24433545) currently commit_sent

On OSD:

2017-04-17 07:35:31.958169 7fab9205c700  0 log_channel(cluster) log
[WRN] : 1 slow requests, 1 included below; oldest blocked for >
30.512702 secs
2017-04-17 07:35:31.958200 7fab9205c700  0 log_channel(cluster) log
[WRN] : slow request 30.512702 seconds old, received at 2017-04-17
07:35:01.445405: osd_repop(client.12007895.1:105308123 13.39
13/1db63c39/rbd_data.922d95238e1f29.0000000000009652/head v
103722'24433545) currently commit_sent
20

Am I correct that these are just write slowness and most is related to
IO to snapshots (snaps)?

Thanks,
Alex

>
>                 Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com