On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil <sage@xxxxxxxxxxx> wrote: > Hi Stefan, > > On Thu, 15 Nov 2012, Sage Weil wrote: >> On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote: >> > Am 14.11.2012 15:59, schrieb Sage Weil: >> > > Hi Stefan, >> > > >> > > I would be nice to confirm that no clients are waiting on replies for >> > > these requests; currently we suspect that the OSD request tracking is the >> > > buggy part. If you query the OSD admin socket you should be able to dump >> > > requests and see the client IP, and then query the client. >> > > >> > > Is it librbd? In that case you likely need to change the config so that >> > > it is listening on an admin socket ('admin socket = path'). >> > >> > Yes it is. So i have to specify admin socket at the KVM host? >> >> Right. IIRC the disk line is a ; (or \;) separated list of key/value >> pairs. >> >> > How do i query the admin socket for requests? >> >> ceph --admin-daemon /path/to/socket help >> ceph --admin-daemon /path/to/socket objecter_dump (i think) > > Were you able to reproduce this? > > Thanks! > sage Meanwhile, I did. :) Such requests will always be created if you have restarted or marked an osd out and then back in and scrub didn`t happen in the meantime (after such operation and before request arrival). What is more interesting, the hangup happens not exactly at the time of operation, but tens of minutes later. { "description": "osd_sub_op(client.1292013.0:45422 4.731 a384cf31\/rbd_data.1415fb1075f187.00000000000000a7\/head\/\/4 [] v 16444'21693 snapset=0=[]:[] snapc=0=[])", "received_at": "2012-11-28 03:54:43.094151", "age": "27812.942680", "duration": "2.676641", "flag_point": "started", "events": [ { "time": "2012-11-28 03:54:43.094222", "event": "waiting_for_osdmap"}, { "time": "2012-11-28 03:54:43.386890", "event": "reached_pg"}, { "time": "2012-11-28 03:54:43.386894", "event": "started"}, { "time": "2012-11-28 03:54:43.386973", "event": "commit_queued_for_journal_write"}, { "time": "2012-11-28 03:54:45.360049", "event": "write_thread_in_journal_buffer"}, { "time": "2012-11-28 03:54:45.586183", "event": "journaled_completion_queued"}, { "time": "2012-11-28 03:54:45.586262", "event": "sub_op_commit"}, { "time": "2012-11-28 03:54:45.770792", "event": "sub_op_applied"}]}]} > > >> >> sage >> >> > >> > Stefan >> > >> > >> > > On Wed, 14 Nov 2012, Stefan Priebe - Profihost AG wrote: >> > > >> > > > Hello list, >> > > > >> > > > i see this several times. Endless flying slow requests. And they never >> > > > stop >> > > > until i restart the mentioned osd. >> > > > >> > > > 2012-11-14 10:11:57.513395 osd.24 [WRN] 1 slow requests, 1 included below; >> > > > oldest blocked for > 31789.858457 secs >> > > > 2012-11-14 10:11:57.513399 osd.24 [WRN] slow request 31789.858457 seconds >> > > > old, >> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 >> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4 >> > > > currently delayed >> > > > 2012-11-14 10:11:58.513584 osd.24 [WRN] 1 slow requests, 1 included below; >> > > > oldest blocked for > 31790.858646 secs >> > > > 2012-11-14 10:11:58.513586 osd.24 [WRN] slow request 31790.858646 seconds >> > > > old, >> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 >> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4 >> > > > currently delayed >> > > > 2012-11-14 10:11:59.513766 osd.24 [WRN] 1 slow requests, 1 included below; >> > > > oldest blocked for > 31791.858827 secs >> > > > 2012-11-14 10:11:59.513768 osd.24 [WRN] slow request 31791.858827 seconds >> > > > old, >> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 >> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4 >> > > > currently delayed >> > > > 2012-11-14 10:12:00.513909 osd.24 [WRN] 1 slow requests, 1 included below; >> > > > oldest blocked for > 31792.858971 secs >> > > > 2012-11-14 10:12:00.513916 osd.24 [WRN] slow request 31792.858971 seconds >> > > > old, >> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 >> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4 >> > > > currently delayed >> > > > 2012-11-14 10:12:01.514061 osd.24 [WRN] 1 slow requests, 1 included below; >> > > > oldest blocked for > 31793.859124 secs >> > > > 2012-11-14 10:12:01.514063 osd.24 [WRN] slow request 31793.859124 seconds >> > > > old, >> > > > received at 2012-11-14 01:22:07.654922: osd_op(client.30286.0:6719 >> > > > rbd_data.75c55bf2fdd7.0000000000001399 [write 282624~4096] 3.3f6d2373) v4 >> > > > currently delayed >> > > > >> > > > When i now restart osd 24 they go away and everything is fine again. >> > > > >> > > > Stefan >> > > > -- >> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > >> > > > >> > > -- >> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html