Hi,
Looking at the output of dump_historic_ops and dump_ops_in_flight
I waited for new slow request messages and dumped the historic_ops into a
file. The reporting OSD shows lots of "waiting for rw locks" messages and a
duration of more than 30 secs:
"age": 366.044746,
"duration": 32.491506,
"type_data": [
"commit sent; apply or cleanup",
{
"client": "client.9664429",
"tid": 130439910
},
[
{
"time": "2017-01-12 10:38:15.227649",
"event": "initiated"
},
{
"time": "2017-01-12 10:38:15.232310",
"event": "reached_pg"
},
{
"time": "2017-01-12 10:38:15.232341",
"event": "waiting for rw locks"
},
{
"time": "2017-01-12 10:38:15.268819",
"event": "reached_pg"
},
[
.
.
.
]
{
"time": "2017-01-12 10:38:45.515055",
"event": "waiting for rw locks"
},
{
"time": "2017-01-12 10:38:46.921095",
"event": "reached_pg"
},
{
"time": "2017-01-12 10:38:46.921157",
"event": "started"
},
{
"time": "2017-01-12 10:38:46.921342",
"event": "waiting for subops from 9,15"
},
{
"time": "2017-01-12 10:38:46.921724",
"event": "commit_queued_for_journal_write"
},
{
"time": "2017-01-12 10:38:46.922186",
"event": "write_thread_in_journal_buffer"
},
{
"time": "2017-01-12 10:38:46.931103",
"event": "sub_op_commit_rec"
},
{
"time": "2017-01-12 10:38:46.968730",
"event": "sub_op_commit_rec"
},
{
"time": "2017-01-12 10:38:47.717770",
"event": "journaled_completion_queued"
},
{
"time": "2017-01-12 10:38:47.718280",
"event": "op_commit"
},
{
"time": "2017-01-12 10:38:47.718359",
"event": "commit_sent"
},
{
"time": "2017-01-12 10:38:47.718890",
"event": "op_applied"
},
{
"time": "2017-01-12 10:38:47.719154",
"event": "done"
}
There were about 70 events "waiting for rw locks", I truncated the output.
Based on the message "waiting for subops from 9,15" I also dumped the
historic_ops for these two OSDs.
Duration on OSD.9
"initiated_at": "2017-01-12 10:38:29.258221",
"age": 54.069919,
"duration": 20.831568,
Duration on OSD.15
"initiated_at": "2017-01-12 10:38:23.695098",
"age": 112.118210,
"duration": 26.452526,
They also contain many "waiting for rw locks" messages, but not as much as
the dump from the reporting OSD.
To me it seems that because two OSDs take a lot of time to process their
requests (only slightly less than 30 secs), it sums up to more than 30 secs
on the reporting (primary?) OSD. Is the reporting OSD always the primary?
How can I debug this further? I searched the web for "waiting for rw locks",
I also found Wido's blog [1] about my exact problem, but I'm not sure how to
continue. Our admin says our network should be fine, but what can I do to
rule that out?
I don't think I have provided information about our cluster yet:
4 nodes, 3 mons, 20 OSDs on
ceph version 0.94.7-84-g8e6f430 (8e6f430683e4d8293e31fd4eb6cb09be96960cfa)
[1]
https://blog.widodh.nl/2016/01/slow-requests-with-ceph-waiting-for-rw-locks/
Thanks!
Eugen
Zitat von Brad Hubbard <bhubbard@xxxxxxxxxx>:
On Thu, Jan 12, 2017 at 2:19 AM, Eugen Block <eblock@xxxxxx> wrote:
Hi,
I simply grepped for "slow request" in ceph.log. What exactly do you mean
by
"effective OSD"?
If I have this log line:
2017-01-11 [...] osd.16 [...] cluster [WRN] slow request 32.868141
seconds
old, received at 2017-01-11 [...] ack+ondisk+write+known_if_redirected
e12440) currently waiting for subops from 0,12
I assumed that osd.16 is the one causing problems.
0 and/or 12 in this case. 16 is just waiting. Looking at the output of
dump_historic_ops and dump_ops_in_flight on the implicated osds may
help to narrow down what the issue is.
But now that you mention
the subops, I only noticed them yesterday, but didn't have the time yet
to
investigate further. I'll have a look into the subops messages and report
back.
Thanks!
Eugen
Zitat von Burkhard Linke
<Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>:
Hi,
just for clarity:
Did you parse the slow request messages and use the effective OSD in the
statistics? Some message may refer to other OSDs, e.g. "waiting for sub
op
on OSD X,Y". The reporting OSD is not the root cause in that case, but
one
of the mentioned OSDs (and I'm currently not aware of a method to
determine
which of the both OSD is the cause in case of 3 replicates.....).
Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Eugen Block voice : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail : eblock@xxxxxx
Vorsitzende des Aufsichtsrates: Angelika Mozdzen
Sitz und Registergericht: Hamburg, HRB 90934
Vorstand: Jens-U. Mozdzen
USt-IdNr. DE 814 013 983
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Cheers,
Brad
--
Eugen Block voice : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail : eblock@xxxxxx
Vorsitzende des Aufsichtsrates: Angelika Mozdzen
Sitz und Registergericht: Hamburg, HRB 90934
Vorstand: Jens-U. Mozdzen
USt-IdNr. DE 814 013 983