Re: Stuck IOs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



^C[root@mon01 ceph]# ceph status

    cluster 55ebbc2d-c5b7-4beb-9688-0926cefee155

     health HEALTH_WARN

            2 requests are blocked > 32 sec

     monmap e1: 3 mons at {mon01=##########:6789/0,mon02=##########:6789/0,mon03=##########:6789/0}

            election epoch 74, quorum 0,1,2 mon01,mon02,mon03

     osdmap e8021: 24 osds: 24 up, 24 in

            flags sortbitwise,require_jewel_osds

      pgmap v7748381: 2624 pgs, 6 pools, 294 GB data, 363 kobjects

            893 GB used, 23436 GB / 24329 GB avail

                2624 active+clean

  client io 4665 kB/s rd, 4367 kB/s wr, 1896 op/s rd, 919 op/s wr

 

From: David Turner <drakonstein@xxxxxxxxx>
Date: Friday, September 22, 2017 at 10:12 AM
To: Matthew Stroud <mattstroud@xxxxxxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: [ceph-users] Stuck IOs

 

It shows that the blocked requests also reset and are now only a few minutes old instead of nearly a full day.  What is your full `ceph status`? The blocked requests are referring to missing objects.

 

On Fri, Sep 22, 2017 at 12:09 PM Matthew Stroud <mattstroud@xxxxxxxxxxxxx> wrote:

Got one to clear:

 

2017-09-22 10:06:23.030648 osd.3 [WRN] 2 slow requests, 1 included below; oldest blocked for > 120.959814 secs

2017-09-22 10:06:23.030657 osd.3 [WRN] slow request 120.959814 seconds old, received at 2017-09-22 10:04:22.070785: osd_op(client.301013529.0:2418 7.e637a4b3 measure [omap-get-vals 0~16] snapc 0=[] RETRY=1 ack+retry+read+balance_reads+skiprwlocks+known_if_redirected e8019) currently waiting for missing object

2017-09-22 10:06:24.030837 osd.3 [WRN] 2 slow requests, 1 included below; oldest blocked for > 121.959995 secs

2017-09-22 10:06:24.030844 osd.3 [WRN] slow request 120.899415 seconds old, received at 2017-09-22 10:04:23.131364: osd_op(client.300809948.0:42472 7.e637a4b3 measure [omap-get-vals 0~16] snapc 0=[] ack+read+balance_reads+skiprwlocks+known_if_redirected e8017) currently waiting for missing object

 

Thanks,

Matthew Stroud

 

From: David Turner <drakonstein@xxxxxxxxx>
Date: Friday, September 22, 2017 at 9:57 AM
To: Matthew Stroud <mattstroud@xxxxxxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: [ceph-users] Stuck IOs

 

The request remains blocked if you issue `ceph osd down 2`?  Marking the offending OSD as down usually clears up blocked requests for me... at least it resets the timer on it and the requests start blocking again if the OSD is starting to fail.

 

On Fri, Sep 22, 2017 at 11:51 AM Matthew Stroud <mattstroud@xxxxxxxxxxxxx> wrote:

It appears I have three stuck IOs after switching my tunables to optimal. We are running 10.2.9 and the offending pool is for gnocchi (which has caused us quite a bit pain at this point). Here are the stuck IOs:

 

2017-09-22 09:05:40.095125 osd.2 ##########:6802/1453572 164 : cluster [WRN] 3 slow requests, 3 included below; oldest blocked for > 61440.543437 secs

2017-09-22 09:05:40.095138 osd.2 ##########:6802/1453572 165 : cluster [WRN] slow request 61440.543437 seconds old, received at 2017-09-21 16:01:39.551597: osd_op(client.268119861.0:141848 7.e637a4b3 measure [omap-get-vals 0~16] snapc 0=[] RETRY=1 ack+retry+read+balance_reads+skiprwlocks+known_if_redirected e8015) currently waiting for missing object

2017-09-22 09:05:40.095147 osd.2 ##########:6802/1453572 166 : cluster [WRN] slow request 61440.542366 seconds old, received at 2017-09-21 16:01:39.552668: osd_op(client.267462582.0:1628859 7.e637a4b3 measure [omap-get-vals 0~16] snapc 0=[] RETRY=1 ack+retry+read+balance_reads+skiprwlocks+known_if_redirected e8015) currently waiting for missing object

2017-09-22 09:05:40.095152 osd.2 ##########:6802/1453572 167 : cluster [WRN] slow request 61440.169811 seconds old, received at 2017-09-21 16:01:39.925223: osd_op(client.267488428.0:141465 7.e637a4b3 measure [omap-get-vals 0~16] snapc 0=[] RETRY=1 ack+retry+read+balance_reads+skiprwlocks+known_if_redirected e8015) currently waiting for missing object

 

Currently all IOs aren’t causing issues, but I can’t get these IOs to clear. I have bounced the OSD multiple times, but they haven’t cleared. Any advice?

 

Also, if anyone has pro tips on how to setup ceph for gnocchi, I’m all ears.

 

Thanks,

Matthew Stroud

 



CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 



CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.




CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux