On Mon, Apr 17, 2017 at 1:42 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: > On Thu, Apr 13, 2017 at 4:24 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> On Thu, Apr 13, 2017 at 5:39 AM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: >>> On Wed, Apr 12, 2017 at 10:51 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >>>> On Wed, Apr 12, 2017 at 4:28 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote: >>>>> Hi Ilya, >>>>> >>>>> On Wed, Apr 12, 2017 at 4:58 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >>>>>> >>>>>> On Tue, Apr 11, 2017 at 3:10 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> >>>>>> wrote: >>>>>> > Hi Ilya, >>>>>> > >>>>>> > On Tue, Apr 11, 2017 at 4:06 AM, Ilya Dryomov <idryomov@xxxxxxxxx> >>>>>> > wrote: >>>>>> >> On Tue, Apr 11, 2017 at 4:01 AM, Alex Gorbachev >>>>>> >> <ag@xxxxxxxxxxxxxxxxxxx> wrote: >>>>>> >>> On Mon, Apr 10, 2017 at 2:16 PM, Alex Gorbachev >>>>>> >>> <ag@xxxxxxxxxxxxxxxxxxx> wrote: >>>>>> >>>> I am trying to understand the cause of a problem we started >>>>>> >>>> encountering a few weeks ago. There are 30 or so per hour messages >>>>>> >>>> on >>>>>> >>>> OSD nodes of type: >>>>>> >>>> >>>>>> >>>> ceph-osd.33.log:2017-04-10 13:42:39.935422 7fd7076d8700 0 bad crc in >>>>>> >>>> data 2227614508 != exp 2469058201 >>>>>> >>>> >>>>>> >>>> and >>>>>> >>>> >>>>>> >>>> 2017-04-10 13:42:39.939284 7fd722c42700 0 -- 10.80.3.25:6826/5752 >>>>>> >>>> submit_message osd_op_reply(1826606251 >>>>>> >>>> rbd_data.922d95238e1f29.00000000000101bf [set-alloc-hint object_size >>>>>> >>>> 16777216 write_size 16777216,write 6328320~12288] v103574'18626765 >>>>>> >>>> uv18626765 ondisk = 0) v6 remote, 10.80.3.216:0/1934733503, failed >>>>>> >>>> lossy con, dropping message 0x3b55600 >>>>>> >>>> >>>>>> >>>> On a client sometimes, but not corresponding to the above: >>>>>> >>>> >>>>>> >>>> Apr 10 11:53:15 roc-5r-scd216 kernel: [4906599.023174] libceph: osd96 >>>>>> >>>> 10.80.3.25:6822 socket error on write >>>>>> >>>> >>>>>> >>>> And from time to time, slow requests: >>>>>> >>>> >>>>>> >>>> 2017-04-10 13:00:04.280686 osd.91 10.80.3.45:6808/5665 231 : cluster >>>>>> >>>> [WRN] slow request 30.108325 seconds old, received at 2017-04-10 >>>>>> >>>> 12:59:34.172283: osd_op(client.11893449.1:324079247 >>>>>> >>>> rbd_data.8fcdfb238e1f29.00000000000187e7 [set-alloc-hint object_size >>>>>> >>>> 16777216 write_size 16777216,write 10772480~8192] 14.ed0bcdec >>>>>> >>>> ondisk+write e103545) currently waiting for subops from 2,104 >>>>>> >>>> 2017-04-10 13:00:06.280949 osd.91 10.80.3.45:6808/5665 232 : cluster >>>>>> >>>> [WRN] 2 slow requests, 1 included below; oldest blocked for > >>>>>> >>>> 32.108610 secs >>>>>> >>>> >>>>>> >>>> Questions: >>>>>> >>>> >>>>>> >>>> 1. Is there any way to drill further into the "bad crc" message? >>>>>> >>>> sometimes they have nothing before or after them, but how to >>>>>> >>>> determine >>>>>> >>>> from/to what this came from - another OSD, client, which one? >>>>>> >>>> >>>>>> >>>> 2. Network seems OK - no errors on NICs, regression testing does not >>>>>> >>>> show any issues. I realize this can be disk response, but using >>>>>> >>>> Christian Balzer's atop recommendation shows a pretty normal system. >>>>>> >>>> What is my best course of troubleshooting here - dump historic ops on >>>>>> >>>> OSD, wireshark the links or anything else? >>>>>> >>>> >>>>>> >>>> 3. Christian, if you are looking at this, what would be your red >>>>>> >>>> flags in atop? >>>>>> >>>> >>>>>> >>> >>>>>> >>> One more note: OSD nodes are running kernel 4.10.2-041002-generic and >>>>>> >>> clients - 4.4.23-040423-generic >>>>>> >> >>>>>> >> Hi Alex, >>>>>> >> >>>>>> >> Did you upgrade the kernel client from < 4.4 to 4.4 somewhere in that >>>>>> >> time >>>>>> >> frame by any chance? >>>>>> > >>>>>> > Yes, they were upgraded from 4.2.8 to 4.4.23 in October. >>>>>> >>>>>> There is a block layer bug in 4.4 and later kernels [1]. It >>>>>> effectively undoes krbd commit [2], which prevents pages from being >>>>>> further updated while in-flight. The timeline doesn't fit though, so >>>>>> it's probably unrelated -- not every workload can trigger it... >>>>>> >>>>>> [1] http://tracker.ceph.com/issues/19275 >>>>>> [2] >>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bae818ee1577c27356093901a0ea48f672eda514 >>>>> >>>>> >>>>> Would the workaround be to drop to 4.3? >>>> >>>> Unfortunately, short of installing a custom kernel or disabling data >>>> CRCs entirely, yes. I'll poke the respective maintainer again later >>>> today... >>> >>> I downgraded to 4.3.6 and so far all the CRC messages and lossy con >>> went away completely. >> >> Thanks for the confirmation. I'll update #19275 with stable kernel >> versions when it's fixed. > > Thank you. I can definitely confirm that the lossy con and crc > mismatch messages are gone. However, the slow requests remain, and > they are of this format: > > 2017-04-17 07:26:35.413710 osd.111 10.80.3.25:6818/5620 587 : cluster > [WRN] slow request 34.205118 seconds old, received at 2017-04-17 > 07:26:01.208492: osd_op(client.11998611.1:426520806 > rbd_data.85ae77238e1f29.000000000000c27c [set-alloc-hint object_size > 16777216 write_size 16777216,write 208896~4096] 26.2bd4f8b9 snapc > 482=[] ondisk+write e103717) currently waiting for subops from 95,101 > 2017-04-17 07:26:35.413716 osd.111 10.80.3.25:6818/5620 588 : cluster > [WRN] slow request 33.851564 seconds old, received at 2017-04-17 > 07:26:01.562046: osd_op(client.12007895.1:105126959 > rbd_data.922d95238e1f29.0000000000009656 [set-alloc-hint object_size > 16777216 write_size 16777216,write 16621568~4096] 13.8fd42a47 snapc > 508=[508,501,4ef] ondisk+write e103717) currently waiting for subops > from 49,213 > 2017-04-17 07:27:46.291096 osd.174 10.80.3.85:6830/5539 189 : cluster > [WRN] 2 slow requests, 1 included below; oldest blocked for > > 36.965481 secs > 2017-04-17 07:27:46.291105 osd.174 10.80.3.85:6830/5539 190 : cluster > [WRN] slow request 30.201051 seconds old, received at 2017-04-17 > 07:27:16.090000: osd_op(client.12007895.1:105127746 > rbd_data.7a6cfe238e1f29.0000000000000001 [set-alloc-hint object_size > 4194304 write_size 4194304,write 0~16384] 21.258dfd1a snapc 12=[] > ondisk+write e103717) currently waiting for subops from 111 > 2017-04-17 07:35:31.958179 osd.42 10.80.3.105:6816/5326 2043 : cluster > [WRN] 1 slow requests, 1 included below; oldest blocked for > > 30.512702 secs > 2017-04-17 07:35:31.958206 osd.42 10.80.3.105:6816/5326 2044 : cluster > [WRN] slow request 30.512702 seconds old, received at 2017-04-17 > 07:35:01.445405: osd_repop(client.12007895.1:105308123 13.39 > 13/1db63c39/rbd_data.922d95238e1f29.0000000000009652/head v > 103722'24433545) currently commit_sent > > On OSD: > > 2017-04-17 07:35:31.958169 7fab9205c700 0 log_channel(cluster) log > [WRN] : 1 slow requests, 1 included below; oldest blocked for > > 30.512702 secs > 2017-04-17 07:35:31.958200 7fab9205c700 0 log_channel(cluster) log > [WRN] : slow request 30.512702 seconds old, received at 2017-04-17 > 07:35:01.445405: osd_repop(client.12007895.1:105308123 13.39 > 13/1db63c39/rbd_data.922d95238e1f29.0000000000009652/head v > 103722'24433545) currently commit_sent > 20 > > Am I correct that these are just write slowness and most is related to > IO to snapshots (snaps)? Yeah, these are completely unrelated. I can't tell for sure if it has anything to do with snaps/clones from that output. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com