Re: Slow requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I  was able to collect dump data during slow request, but this time I saw that it was related to high load average and iowait so I keep watching. 
And it was on particular two osds, but yesterday on other osds. 
I see in dump of these two osds that operations are stuck on queued_for_pg, for example:
            "description": "osd_op(client.13057605.0:51528 17.15 17:a93a5511:::notify.2:head [watch ping cookie 94259433737472] snapc 0=[] ondisk+write+known_if_redirected e10936)",
            "initiated_at": "2017-10-20 12:34:29.134946",
            "age": 484.314936,
            "duration": 55.421058,
            "type_data": {
                "flag_point": "started",
                "client_info": {
                    "client": "client.13057605",
                    "client_addr": "10.192.1.78:0/3748652520",
                    "tid": 51528
                },
                "events": [
                    {
                        "time": "2017-10-20 12:34:29.134946",
                        "event": "initiated"
                    },
                    {
                        "time": "2017-10-20 12:34:29.135075",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2017-10-20 12:35:24.555957",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2017-10-20 12:35:24.555978",
                        "event": "started"
                    },
                    {
                        "time": "2017-10-20 12:35:24.556004",
                        "event": "done"
                    }
                ]
            }
        },

I've read thread http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021588.html.
Very similar problem, can it be connected to Proxmox? I have quite old version of proxmox-ve: 4.4-80, and ceph jewel clients on pve nodes.

С уважением, 
Ухина Ольга

Моб. тел.: 8(905)-566-46-62

2017-10-20 11:05 GMT+03:00 Ольга Ухина <olga.uhina@xxxxxxxxx>:
Hi! Thanks for your help.
How can I increase interval of history for command ceph daemon osd.<id> dump_historic_ops? It shows only for several minutes.
I see slow requests on random osds each time and on different hosts (there are three). As I see in logs the problem doesn't relate to scrubbing.

Regards, 
Olga Ukhina


2017-10-20 4:42 GMT+03:00 Brad Hubbard <bhubbard@xxxxxxxxxx>:
I guess you have both read and followed
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests

What was the result?

On Fri, Oct 20, 2017 at 2:50 AM, J David <j.david.lists@xxxxxxxxx> wrote:
> On Wed, Oct 18, 2017 at 8:12 AM, Ольга Ухина <olga.uhina@xxxxxxxxx> wrote:
>> I have a problem with ceph luminous 12.2.1.
>> […]
>> I have slow requests on different OSDs on random time (for example at night,
>> but I don’t see any problems at the time of problem
>> […]
>> 2017-10-18 01:20:38.187326 mon.st3 mon.0 10.192.1.78:6789/0 22689 : cluster
>> [WRN] Health check update: 49 slow requests are blocked > 32 sec
>> (REQUEST_SLOW)
>
> This looks almost exactly like what we have been experiencing, and
> your use-case (Proxmox client using rbd) is the same as ours as well.
>
> Unfortunately we were not able to find the source of the issue so far,
> and haven’t gotten much feedback from the list.  Extensive testing of
> every component has ruled out any hardware issue we can think of.
>
> Originally we thought our issue was related to deep-scrub, but that
> now appears not to be the case, as it happens even when nothing is
> being deep-scrubbed.  Nonetheless, although they aren’t the cause,
> they definitely make the problem much worse.  So you may want to check
> to see if deep-scrub operations are happening at the times where you
> see issues and (if so) whether the OSDs participating in the
> deep-scrub are the same ones reporting slow requests.
>
> Hopefully you have better luck finding/fixing this than we have!  It’s
> definitely been a very frustrating issue for us.
>
> Thanks!
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Cheers,
Brad


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux