Re: Blocked requests

Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> · Thu, 14 Dec 2017 09:22:19 +0100

Hallo Matthew, thanks for your feedback!
  Please clarify one point: you mean that you recreated the pool as an 
erasure-coded one, or that you recreated it as a regular replicated one? 
I mean, you now have an erasure-coded pool in production as a gnocchi 
backend?

  In any case, from the instability you mention, experimenting with 
BlueStore looks like a better alternative.

  Thanks again

			Fulvio

-------- Original Message --------
Subject: Re:  Blocked requests
From: Matthew Stroud <mattstroud@xxxxxxxxxxxxx>
To: Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx>, Brian Andrus 
<brian.andrus@xxxxxxxxxxxxx>
CC: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Date: 12/13/2017 5:05 PM

We fixed it by destroying the pool and recreating it though this isn’t really a fix. Come to find out ceph has a weakness for small high change rate objects (the behavior that gnocchi displays). The cluster will keep going fine until an event (aka a reboot, osd failure, etc) happens. I haven’t been able to find another solution.

I have heard that BlueStore handles this better, but that wasn’t stable on the release we are on.

Thanks,
Matthew Stroud

On 12/13/17, 3:56 AM, "Fulvio Galeazzi" <fulvio.galeazzi@xxxxxxx> wrote:

     Hallo Matthew,
          I am now facing the same issue and found this message of yours.
        Were you eventually able to figure what the problem is, with
     erasure-coded pools?

     At first sight, the bugzilla page linked by Brian does not seem to
     specifically mention erasure-coded pools...

        Thanks for your help

     Fulvio

     -------- Original Message --------
     Subject: Re:  Blocked requests
     From: Matthew Stroud <mattstroud@xxxxxxxxxxxxx>
     To: Brian Andrus <brian.andrus@xxxxxxxxxxxxx>
     CC: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
     Date: 09/07/2017 11:01 PM

     > After some troubleshooting, the issues appear to be caused by gnocchi
     > using rados. I’m trying to figure out why.
     >
     > Thanks,
     >
     > Matthew Stroud
     >
     > *From: *Brian Andrus <brian.andrus@xxxxxxxxxxxxx>
     > *Date: *Thursday, September 7, 2017 at 1:53 PM
     > *To: *Matthew Stroud <mattstroud@xxxxxxxxxxxxx>
     > *Cc: *David Turner <drakonstein@xxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx"
     > <ceph-users@xxxxxxxxxxxxxx>
     > *Subject: *Re:  Blocked requests
     >
     > "ceph osd blocked-by" can do the same thing as that provided script.
     >
     > Can you post relevant osd.10 logs and a pg dump of an affected placement
     > group? Specifically interested in recovery_state section.
     >
     > Hopefully you were careful in how you were rebooting OSDs, and not
     > rebooting multiple in the same failure domain before recovery was able
     > to occur.
     >
     > On Thu, Sep 7, 2017 at 12:30 PM, Matthew Stroud
     > <mattstroud@xxxxxxxxxxxxx <mailto:mattstroud@xxxxxxxxxxxxx>> wrote:
     >
     >     Here is the output of your snippet:
     >
     >     [root@mon01 ceph-conf]# bash /tmp/ceph_foo.sh
     >
     >            6 osd.10
     >
     >     52  ops are blocked > 4194.3   sec on osd.17
     >
     >     9   ops are blocked > 2097.15  sec on osd.10
     >
     >     4   ops are blocked > 1048.58  sec on osd.10
     >
     >     39  ops are blocked > 262.144  sec on osd.10
     >
     >     19  ops are blocked > 131.072  sec on osd.10
     >
     >     6   ops are blocked > 65.536   sec on osd.10
     >
     >     2   ops are blocked > 32.768   sec on osd.10
     >
     >     Here is some backfilling info:
     >
     >     [root@mon01 ceph-conf]# ceph status
     >
     >          cluster 55ebbc2d-c5b7-4beb-9688-0926cefee155
     >
     >           health HEALTH_WARN
     >
     >                  5 pgs backfilling
     >
     >                  5 pgs degraded
     >
     >                  5 pgs stuck degraded
     >
     >                  5 pgs stuck unclean
     >
     >                  5 pgs stuck undersized
     >
     >                  5 pgs undersized
     >
     >                  122 requests are blocked > 32 sec
     >
     >                  recovery 2361/1097929 objects degraded (0.215%)
     >
     >                  recovery 5578/1097929 objects misplaced (0.508%)
     >
     >           monmap e1: 3 mons at
     >     {mon01=10.20.57.10:6789/0,mon02=10.20.57.11:6789/0,mon03=10.20.57.12:6789/0
     >     <http://10.20.57.10:6789/0,mon02=10.20.57.11:6789/0,mon03=10.20.57.12:6789/0>}
     >
     >                  election epoch 58, quorum 0,1,2 mon01,mon02,mon03
     >
     >           osdmap e6511: 24 osds: 21 up, 21 in; 5 remapped pgs
     >
     >                  flags sortbitwise,require_jewel_osds
     >
     >            pgmap v6474659: 2592 pgs, 5 pools, 333 GB data, 356 kobjects
     >
     >                  1005 GB used, 20283 GB / 21288 GB avail
     >
     >                  2361/1097929 objects degraded (0.215%)
     >
     >                  5578/1097929 objects misplaced (0.508%)
     >
     >                      2587 active+clean
     >
     >                         5 active+undersized+degraded+remapped+backfilling
     >
     >     [root@mon01 ceph-conf]# ceph pg dump_stuck unclean
     >
     >     ok
     >
     >     pg_stat state   up      up_primary      acting  acting_primary
     >
     >     3.5c2   active+undersized+degraded+remapped+backfilling
     >     [17,2,10]       17      [17,2]  17
     >
     >     3.54a   active+undersized+degraded+remapped+backfilling
     >     [10,19,2]       10      [10,17] 10
     >
     >     5.3b    active+undersized+degraded+remapped+backfilling
     >     [3,19,0]        3       [10,17] 10
     >
     >     5.b3    active+undersized+degraded+remapped+backfilling
     >     [10,19,2]       10      [10,17] 10
     >
     >     3.180   active+undersized+degraded+remapped+backfilling
     >     [17,10,6]       17      [22,19] 22
     >
     >     Most of the back filling is was caused by restarting osds to clear
     >     blocked IO. Here are some of the blocked IOs:
     >
     >     /var/log/ceph/ceph.log:2017-09-07 13:29:36.978559 osd.10
     >     10.20.57.15:6806/7029 <http://10.20.57.15:6806/7029> 9362 : cluster
     >     [WRN] slow request 60.834494 seconds old, received at 2017-09-07
     >     13:28:36.143920: osd_op(client.114947.0:2039090 5.e637a4b3
     >     (undecoded) ack+read+balance_reads+skiprwlocks+known_if_redirected
     >     e6511) currently queued_for_pg
     >
     >     /var/log/ceph/ceph.log:2017-09-07 13:29:36.978565 osd.10
     >     10.20.57.15:6806/7029 <http://10.20.57.15:6806/7029> 9363 : cluster
     >     [WRN] slow request 240.661052 seconds old, received at 2017-09-07
     >     13:25:36.317363: osd_op(client.246934107.0:3 5.f69addd6 (undecoded)
     >     ack+read+known_if_redirected e6511) currently queued_for_pg
     >
     >     /var/log/ceph/ceph.log:2017-09-07 13:29:36.978571 osd.10
     >     10.20.57.15:6806/7029 <http://10.20.57.15:6806/7029> 9364 : cluster
     >     [WRN] slow request 240.660763 seconds old, received at 2017-09-07
     >     13:25:36.317651: osd_op(client.246944377.0:2 5.f69addd6 (undecoded)
     >     ack+read+known_if_redirected e6511) currently queued_for_pg
     >
     >     /var/log/ceph/ceph.log:2017-09-07 13:29:36.978576 osd.10
     >     10.20.57.15:6806/7029 <http://10.20.57.15:6806/7029> 9365 : cluster
     >     [WRN] slow request 240.660675 seconds old, received at 2017-09-07
     >     13:25:36.317740: osd_op(client.246944377.0:3 5.f69addd6 (undecoded)
     >     ack+read+known_if_redirected e6511) currently queued_for_pg
     >
     >     /var/log/ceph/ceph.log:2017-09-07 13:29:42.979367 osd.10
     >     10.20.57.15:6806/7029 <http://10.20.57.15:6806/7029> 9366 : cluster
     >     [WRN] 72 slow requests, 3 included below; oldest blocked for >
     >     1820.342287 secs
     >
     >     /var/log/ceph/ceph.log:2017-09-07 13:29:42.979373 osd.10
     >     10.20.57.15:6806/7029 <http://10.20.57.15:6806/7029> 9367 : cluster
     >     [WRN] slow request 30.606290 seconds old, received at 2017-09-07
     >     13:29:12.372999: osd_op(client.115008.0:996024003 5.e637a4b3
     >     (undecoded) ondisk+write+skiprwlocks+known_if_redirected e6511)
     >     currently queued_for_pg
     >
     >     /var/log/ceph/ceph.log:2017-09-07 13:29:42.979377 osd.10
     >     10.20.57.15:6806/7029 <http://10.20.57.15:6806/7029> 9368 : cluster
     >     [WRN] slow request 30.554317 seconds old, received at 2017-09-07
     >     13:29:12.424972: osd_op(client.115020.0:1831942 5.39f2d3b
     >     (undecoded) ack+read+known_if_redirected e6511) currently queued_for_pg
     >
     >     /var/log/ceph/ceph.log:2017-09-07 13:29:42.979383 osd.10
     >     10.20.57.15:6806/7029 <http://10.20.57.15:6806/7029> 9369 : cluster
     >     [WRN] slow request 30.368086 seconds old, received at 2017-09-07
     >     13:29:12.611204: osd_op(client.115014.0:73392774 5.e637a4b3
     >     (undecoded) ack+read+balance_reads+skiprwlocks+known_if_redirected
     >     e6511) currently queued_for_pg
     >
     >     /var/log/ceph/ceph.log:2017-09-07 13:29:43.979553 osd.10
     >     10.20.57.15:6806/7029 <http://10.20.57.15:6806/7029> 9370 : cluster
     >     [WRN] 73 slow requests, 1 included below; oldest blocked for >
     >     1821.342499 secs
     >
     >     /var/log/ceph/ceph.log:2017-09-07 13:29:43.979559 osd.10
     >     10.20.57.15:6806/7029 <http://10.20.57.15:6806/7029> 9371 : cluster
     >     [WRN] slow request 30.452344 seconds old, received at 2017-09-07
     >     13:29:13.527157: osd_op(client.115011.0:483954528 5.e637a4b3
     >     (undecoded) ack+read+balance_reads+skiprwlocks+known_if_redirected
     >     e6511) currently queued_for_pg
     >
     >     *From: *David Turner <drakonstein@xxxxxxxxx
     >     <mailto:drakonstein@xxxxxxxxx>>
     >     *Date: *Thursday, September 7, 2017 at 1:17 PM
     >
     >
     >     *To: *Matthew Stroud <mattstroud@xxxxxxxxxxxxx
     >     <mailto:mattstroud@xxxxxxxxxxxxx>>, "ceph-users@xxxxxxxxxxxxxx
     >     <mailto:ceph-users@xxxxxxxxxxxxxx>" <ceph-users@xxxxxxxxxxxxxx
     >     <mailto:ceph-users@xxxxxxxxxxxxxx>>
     >     *Subject: *Re:  Blocked requests
     >
     >     I would recommend pushing forward with the update instead of rolling
     >     back.  Ceph doesn't have a track record of rolling back to a
     >     previous version.
     >
     >     I don't have enough information to really make sense of the ceph
     >     health detail output.  Like are the osds listed all on the same
     >     host?  Over time of watching this output, are some of the requests
     >     clearing up?  Are there any other patterns?  I put the following in
     >     a script and run it in a watch command to try and follow patterns
     >     when I'm plagued with blocked requests.
     >
     >          output=$(ceph --cluster $cluster health detail | grep 'ops are
     >     blocked' | sort -nrk6 | sed 's/ ops/+ops/' | sed 's/ sec/+sec/' |
     >     column -t -s'+')
     >
     >          echo "$output" | grep -v 'on osd'
     >
     >          echo "$output" | grep -Eo osd.[0-9]+ | sort -n | uniq -c | grep
     >     -v ' 1 '
     >
     >          echo "$output" | grep 'on osd'
     >
     >     Why do you have backfilling?  You haven't mentioned that you have
     >     any backfilling yet.  Installing an update shouldn't cause
     >     backfilling, but it's likely related to your blocked requests.
     >
     >     On Thu, Sep 7, 2017 at 2:24 PM Matthew Stroud
     >     <mattstroud@xxxxxxxxxxxxx <mailto:mattstroud@xxxxxxxxxxxxx>> wrote:
     >
     >         Well in the meantime things have gone from bad to worse now the
     >         cluster isn’t rebuilding and clients are unable to pass IO to
     >         the cluster. When this first took place, we started rolling back
     >         to 10.2.7, though that was successful, it didn’t help with the
     >         issue. Here is the command output:
     >
     >         HEALTH_WARN 39 pgs backfill_wait; 5 pgs backfilling; 43 pgs
     >         degraded; 43 pgs stuck degraded; 44 pgs stuck unclean; 43 pgs
     >         stuck undersized; 43 pgs undersized; 367 requests are blocked >
     >         32 sec; 14 osds have slow requests; recovery 4678/1097738
     >         objects degraded (0.426%); recovery 10364/1097738 objects
     >         misplaced (0.944%)
     >
     >         pg 3.624 is stuck unclean for 1402.022837, current state
     >         active+undersized+degraded+remapped+wait_backfill, last acting
     >         [12,9]
     >
     >         pg 3.587 is stuck unclean for 2536.693566, current state
     >         active+undersized+degraded+remapped+wait_backfill, last acting
     >         [18,13]
     >
     >         pg 3.45f is stuck unclean for 1421.178244, current state
     >         active+undersized+degraded+remapped+wait_backfill, last acting
     >         [14,10]
     >
     >         pg 3.41a is stuck unclean for 1505.091187, current state
     >         active+undersized+degraded+remapped+wait_backfill, last acting
     >         [9,23]
     >
     >         pg 3.4cc is stuck unclean for 1560.824332, current state
     >         active+undersized+degraded+remapped+wait_backfill, last acting
     >         [18,10]
     >
     >         < snip>
     >
     >         pg 3.188 is stuck degraded for 1207.118130, current state
     >         active+undersized+degraded+remapped+wait_backfill, last acting
     >         [14,17]
     >
     >         pg 3.768 is stuck degraded for 1123.722910, current state
     >         active+undersized+degraded+remapped+wait_backfill, last acting
     >         [11,18]
     >
     >         pg 3.77c is stuck degraded for 1211.981606, current state
     >         active+undersized+degraded+remapped+wait_backfill, last acting [9,2]
     >
     >         pg 3.7d1 is stuck degraded for 1074.422756, current state
     >         active+undersized+degraded+remapped+wait_backfill, last acting
     >         [10,12]
     >
     >         pg 3.7d1 is active+undersized+degraded+remapped+wait_backfill,
     >         acting [10,12]
     >
     >         pg 3.77c is active+undersized+degraded+remapped+wait_backfill,
     >         acting [9,2]
     >
     >         pg 3.768 is active+undersized+degraded+remapped+wait_backfill,
     >         acting [11,18]
     >
     >         pg 3.709 is active+undersized+degraded+remapped+wait_backfill,
     >         acting [10,4]
     >
     >         <snip>
     >
     >         pg 3.5d8 is active+undersized+degraded+remapped+wait_backfill,
     >         acting [2,10]
     >
     >         pg 3.5dc is active+undersized+degraded+remapped+wait_backfill,
     >         acting [8,19]
     >
     >         pg 3.5f8 is active+undersized+degraded+remapped+wait_backfill,
     >         acting [2,21]
     >
     >         pg 3.624 is active+undersized+degraded+remapped+wait_backfill,
     >         acting [12,9]
     >
     >         2 ops are blocked > 1048.58 sec on osd.9
     >
     >         3 ops are blocked > 65.536 sec on osd.9
     >
     >         7 ops are blocked > 1048.58 sec on osd.8
     >
     >         1 ops are blocked > 524.288 sec on osd.8
     >
     >         1 ops are blocked > 131.072 sec on osd.8
     >
     >         <snip>
     >
     >         1 ops are blocked > 524.288 sec on osd.2
     >
     >         1 ops are blocked > 262.144 sec on osd.2
     >
     >         2 ops are blocked > 65.536 sec on osd.21
     >
     >         9 ops are blocked > 1048.58 sec on osd.5
     >
     >         9 ops are blocked > 524.288 sec on osd.5
     >
     >         71 ops are blocked > 131.072 sec on osd.5
     >
     >         19 ops are blocked > 65.536 sec on osd.5
     >
     >         35 ops are blocked > 32.768 sec on osd.5
     >
     >         14 osds have slow requests
     >
     >         recovery 4678/1097738 objects degraded (0.426%)
     >
     >         recovery 10364/1097738 objects misplaced (0.944%)
     >
     >         *From: *David Turner <drakonstein@xxxxxxxxx
     >         <mailto:drakonstein@xxxxxxxxx>>
     >         *Date: *Thursday, September 7, 2017 at 11:33 AM
     >         *To: *Matthew Stroud <mattstroud@xxxxxxxxxxxxx
     >         <mailto:mattstroud@xxxxxxxxxxxxx>>, "ceph-users@xxxxxxxxxxxxxx
     >         <mailto:ceph-users@xxxxxxxxxxxxxx>" <ceph-users@xxxxxxxxxxxxxx
     >         <mailto:ceph-users@xxxxxxxxxxxxxx>>
     >         *Subject: *Re:  Blocked requests
     >
     >         To be fair, other times I have to go in and tweak configuration
     >         settings and timings to resolve chronic blocked requests.
     >
     >         On Thu, Sep 7, 2017 at 1:32 PM David Turner
     >         <drakonstein@xxxxxxxxx <mailto:drakonstein@xxxxxxxxx>> wrote:
     >
     >             `ceph health detail` will give a little more information
     >             into the blocked requests.  Specifically which OSDs are the
     >             requests blocked on and how long have they actually been
     >             blocked (as opposed to '> 32 sec').  I usually find a
     >             pattern after watching that for a time and narrow things
     >             down to an OSD, journal, etc.  Some times I just need to
     >             restart a specific OSD and all is well.
     >
     >             On Thu, Sep 7, 2017 at 10:33 AM Matthew Stroud
     >             <mattstroud@xxxxxxxxxxxxx <mailto:mattstroud@xxxxxxxxxxxxx>>
     >             wrote:
     >
     >                 After updating from 10.2.7 to 10.2.9 I have a bunch of
     >                 blocked requests for ‘currently waiting for missing
     >                 object’. I have tried bouncing the osds and rebooting
     >                 the osd nodes, but that just moves the problems around.
     >                 Previous to this upgrade we had no issues. Any ideas of
     >                 what to look at?
     >
     >                 Thanks,
     >
     >                 Matthew Stroud
     >
     >                 ------------------------------------------------------------------------
     >
     >
     >                 CONFIDENTIALITY NOTICE: This message is intended only
     >                 for the use and review of the individual or entity to
     >                 which it is addressed and may contain information that
     >                 is privileged and confidential. If the reader of this
     >                 message is not the intended recipient, or the employee
     >                 or agent responsible for delivering the message solely
     >                 to the intended recipient, you are hereby notified that
     >                 any dissemination, distribution or copying of this
     >                 communication is strictly prohibited. If you have
     >                 received this communication in error, please notify
     >                 sender immediately by telephone or return email. Thank you.
     >
     >                 _______________________________________________
     >                 ceph-users mailing list
     >                 ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     >                 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
     >
     >         ------------------------------------------------------------------------
     >
     >
     >         CONFIDENTIALITY NOTICE: This message is intended only for the
     >         use and review of the individual or entity to which it is
     >         addressed and may contain information that is privileged and
     >         confidential. If the reader of this message is not the intended
     >         recipient, or the employee or agent responsible for delivering
     >         the message solely to the intended recipient, you are hereby
     >         notified that any dissemination, distribution or copying of this
     >         communication is strictly prohibited. If you have received this
     >         communication in error, please notify sender immediately by
     >         telephone or return email. Thank you.
     >
     >     ------------------------------------------------------------------------
     >
     >
     >     CONFIDENTIALITY NOTICE: This message is intended only for the use
     >     and review of the individual or entity to which it is addressed and
     >     may contain information that is privileged and confidential. If the
     >     reader of this message is not the intended recipient, or the
     >     employee or agent responsible for delivering the message solely to
     >     the intended recipient, you are hereby notified that any
     >     dissemination, distribution or copying of this communication is
     >     strictly prohibited. If you have received this communication in
     >     error, please notify sender immediately by telephone or return
     >     email. Thank you.
     >
     >
     >     _______________________________________________
     >     ceph-users mailing list
     >     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
     >
     >
     >
     > --
     >
     > Brian Andrus | Cloud Systems Engineer | DreamHost
     >
     > brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com <http://www.dreamhost.com>
     >
     >
     > ------------------------------------------------------------------------
     >
     > CONFIDENTIALITY NOTICE: This message is intended only for the use and
     > review of the individual or entity to which it is addressed and may
     > contain information that is privileged and confidential. If the reader
     > of this message is not the intended recipient, or the employee or agent
     > responsible for delivering the message solely to the intended recipient,
     > you are hereby notified that any dissemination, distribution or copying
     > of this communication is strictly prohibited. If you have received this
     > communication in error, please notify sender immediately by telephone or
     > return email. Thank you.
     >
     >
     > _______________________________________________
     > ceph-users mailing list
     > ceph-users@xxxxxxxxxxxxxx
     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
     >

________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com