Re: Slow requests

Ольга Ухина <olga.uhina@xxxxxxxxx> · Fri, 20 Oct 2017 13:23:25 +0300

I  was able to collect dump data during slow request, but this time I saw that it was related to high load average and iowait so I keep watching. And it was on particular two osds, but yesterday on other osds. 
I see in dump of these two osds that operations are stuck on queued_for_pg, for example:
            "description": "osd_op(client.13057605.0:51528 17.15 17:a93a5511:::notify.2:head [watch ping cookie 94259433737472] snapc 0=[] ondisk+write+known_if_redirected e10936)",
            "initiated_at": "2017-10-20 12:34:29.134946",
            "age": 484.314936,
            "duration": 55.421058,
            "type_data": {
                "flag_point": "started",
                "client_info": {
                    "client": "client.13057605",
                    "client_addr": "10.192.1.78:0/3748652520",
                    "tid": 51528
                },
                "events": [
                    {
                        "time": "2017-10-20 12:34:29.134946",
                        "event": "initiated"
                    },
                    {
                        "time": "2017-10-20 12:34:29.135075",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2017-10-20 12:35:24.555957",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2017-10-20 12:35:24.555978",
                        "event": "started"
                    },
                    {
                        "time": "2017-10-20 12:35:24.556004",
                        "event": "done"
                    }
                ]
            }
        },

I've read thread http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021588.html.
Very similar problem, can it be connected to Proxmox? I have quite old version of proxmox-ve: 4.4-80, and ceph jewel clients on pve nodes.

С уважением, 
Ухина Ольга

Моб. тел.: 8(905)-566-46-62

2017-10-20 11:05 GMT+03:00 Ольга Ухина <olga.uhina@xxxxxxxxx>:
Hi! Thanks for your help.How can I increase interval of history for command ceph daemon osd.<id> dump_historic_ops? It shows only for several minutes.
I see slow requests on random osds each time and on different hosts (there are three). As I see in logs the problem doesn't relate to scrubbing.

Regards, 
Olga Ukhina

2017-10-20 4:42 GMT+03:00 Brad Hubbard <bhubbard@xxxxxxxxxx>:
I guess you have both read and followed

http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests

What was the result?

On Fri, Oct 20, 2017 at 2:50 AM, J David <j.david.lists@xxxxxxxxx> wrote:

> On Wed, Oct 18, 2017 at 8:12 AM, Ольга Ухина <olga.uhina@xxxxxxxxx> wrote:

>> I have a problem with ceph luminous 12.2.1.

>> […]

>> I have slow requests on different OSDs on random time (for example at night,

>> but I don’t see any problems at the time of problem

>> […]

>> 2017-10-18 01:20:38.187326 mon.st3 mon.0 10.192.1.78:6789/0 22689 : cluster

>> [WRN] Health check update: 49 slow requests are blocked > 32 sec

>> (REQUEST_SLOW)

>

> This looks almost exactly like what we have been experiencing, and

> your use-case (Proxmox client using rbd) is the same as ours as well.

>

> Unfortunately we were not able to find the source of the issue so far,

> and haven’t gotten much feedback from the list.  Extensive testing of

> every component has ruled out any hardware issue we can think of.

>

> Originally we thought our issue was related to deep-scrub, but that

> now appears not to be the case, as it happens even when nothing is

> being deep-scrubbed.  Nonetheless, although they aren’t the cause,

> they definitely make the problem much worse.  So you may want to check

> to see if deep-scrub operations are happening at the times where you

> see issues and (if so) whether the OSDs participating in the

> deep-scrub are the same ones reporting slow requests.

>

> Hopefully you have better luck finding/fixing this than we have!  It’s

> definitely been a very frustrating issue for us.

>

> Thanks!

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Cheers,

Brad

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com