Re: slow requests break performance

Eugen Block <eblock@xxxxxx> · Thu, 12 Jan 2017 11:41:12 +0100

Hi,

Looking at the output of dump_historic_ops and dump_ops_in_flight

I waited for new slow request messages and dumped the historic_ops  
into a file. The reporting OSD shows lots of "waiting for rw locks"  
messages and a duration of more than 30 secs:

             "age": 366.044746,
            "duration": 32.491506,
            "type_data": [
                "commit sent; apply or cleanup",
                {
                    "client": "client.9664429",
                    "tid": 130439910
                },
                [
                    {
                        "time": "2017-01-12 10:38:15.227649",
                        "event": "initiated"
                    },
                    {
                        "time": "2017-01-12 10:38:15.232310",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2017-01-12 10:38:15.232341",
                        "event": "waiting for rw locks"
                    },
                    {
                        "time": "2017-01-12 10:38:15.268819",
                        "event": "reached_pg"
                    },
[
.
.
.
]
                    {
                        "time": "2017-01-12 10:38:45.515055",
                        "event": "waiting for rw locks"
                    },
                    {
                        "time": "2017-01-12 10:38:46.921095",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2017-01-12 10:38:46.921157",
                        "event": "started"
                    },
                    {
                        "time": "2017-01-12 10:38:46.921342",
                        "event": "waiting for subops from 9,15"
                    },
                    {
                        "time": "2017-01-12 10:38:46.921724",
                        "event": "commit_queued_for_journal_write"
                    },
                    {
                        "time": "2017-01-12 10:38:46.922186",
                        "event": "write_thread_in_journal_buffer"
                    },
                    {
                        "time": "2017-01-12 10:38:46.931103",
                        "event": "sub_op_commit_rec"
                    },
                    {
                        "time": "2017-01-12 10:38:46.968730",
                        "event": "sub_op_commit_rec"
                    },
                    {
                        "time": "2017-01-12 10:38:47.717770",
                        "event": "journaled_completion_queued"
                    },
                    {
                        "time": "2017-01-12 10:38:47.718280",
                        "event": "op_commit"
                    },
                    {
                        "time": "2017-01-12 10:38:47.718359",
                        "event": "commit_sent"
                    },
                    {
                        "time": "2017-01-12 10:38:47.718890",
                        "event": "op_applied"
                    },
                    {
                        "time": "2017-01-12 10:38:47.719154",
                        "event": "done"
                    }

There were about 70 events "waiting for rw locks", I truncated the output.
Based on the message "waiting for subops from 9,15" I also dumped the  
historic_ops for these two OSDs.

Duration on OSD.9

            "initiated_at": "2017-01-12 10:38:29.258221",
            "age": 54.069919,
            "duration": 20.831568,

Duration on OSD.15

            "initiated_at": "2017-01-12 10:38:23.695098",
            "age": 112.118210,
            "duration": 26.452526,

They also contain many "waiting for rw locks" messages, but not as  
much as the dump from the reporting OSD.
To me it seems that because two OSDs take a lot of time to process  
their requests (only slightly less than 30 secs), it sums up to more  
than 30 secs on the reporting (primary?) OSD. Is the reporting OSD  
always the primary?

How can I debug this further? I searched the web for "waiting for rw  
locks", I also found Wido's blog [1] about my exact problem, but I'm  
not sure how to continue. Our admin says our network should be fine,  
but what can I do to rule that out?

I don't think I have provided information about our cluster yet:

4 nodes, 3 mons, 20 OSDs on
ceph version 0.94.7-84-g8e6f430 (8e6f430683e4d8293e31fd4eb6cb09be96960cfa)

[1]  
https://blog.widodh.nl/2016/01/slow-requests-with-ceph-waiting-for-rw-locks/

Thanks!
Eugen

Zitat von Brad Hubbard <bhubbard@xxxxxxxxxx>:

On Thu, Jan 12, 2017 at 2:19 AM, Eugen Block <eblock@xxxxxx> wrote:
Hi,

I simply grepped for "slow request" in ceph.log. What exactly do you mean by
"effective OSD"?

If I have this log line:
2017-01-11 [...] osd.16 [...] cluster [WRN] slow request 32.868141 seconds
old, received at 2017-01-11 [...] ack+ondisk+write+known_if_redirected
e12440) currently waiting for subops from 0,12

I assumed that osd.16 is the one causing problems.

0 and/or 12 in this case. 16 is just waiting. Looking at the output of
dump_historic_ops and dump_ops_in_flight on the implicated osds may
help to narrow down what the issue is.

But now that you mention
the subops, I only noticed them yesterday, but didn't have the time yet to
investigate further. I'll have a look into the subops messages and report
back.

Thanks!
Eugen

Zitat von Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>:

Hi,

just for clarity:

Did you parse the slow request messages and use the effective OSD in the
statistics? Some message may refer to other OSDs, e.g. "waiting for sub op
on OSD X,Y". The reporting OSD is not the root cause in that case, but one
of the mentioned OSDs (and I'm currently not aware of a method to determine
which of the both OSD is the cause in case of 3 replicates.....).

Regards,

Burkhard

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Eugen Block                             voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg                         e-mail  : eblock@xxxxxx

        Vorsitzende des Aufsichtsrates: Angelika Mozdzen
          Sitz und Registergericht: Hamburg, HRB 90934
                  Vorstand: Jens-U. Mozdzen
                   USt-IdNr. DE 814 013 983

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Cheers,
Brad

--
Eugen Block                             voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG      fax     : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg                         e-mail  : eblock@xxxxxx

        Vorsitzende des Aufsichtsrates: Angelika Mozdzen
          Sitz und Registergericht: Hamburg, HRB 90934
                  Vorstand: Jens-U. Mozdzen
                   USt-IdNr. DE 814 013 983

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com