Re: Stuck Request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Interesting, I don't think the request is stalled.  I think we
completed the request, but leaked a reference to the request
structure.  Do you see IO from the clients stall?  What is the output
of ceph -s?  What version are you running (ceph-osd --version)?
-Sam

On Mon, Oct 29, 2012 at 10:53 AM, Ian Pye <ianpye@xxxxxxxxx> wrote:
> Guys,
>
> I'm running a three node cluster (version 0.53), and after a while of
> running under constant write load generated by two daemons, I am
> seeing that 1 request is totally blocked:
>
> [WRN] 1 slow requests, 1 included below; oldest blocked for > 7550.891933 secs
> 2012-10-29 10:33:54.689563 osd.0 [WRN] slow request 7550.891933
> seconds old, received at 2012-10-29 08:28:03.797576:
> osd_sub_op(client.4116.0:490 0.3e
> e3aa943e//logger/pg/data/2012-10-29/BWBCK/1351524240/head//0 [] v
> 13'37 snapset=0=[]:[] snapc=0=[]) v7 currently started
>
> ceph --admin-daemon /path/to/osd.1.asok dump_ops_in_flight gives:
> "ops": [
>         { "description": "osd_sub_op(client.4116.0:490 0.3e
> e3aa943e\/\/logger\/pg\/data\/2012-10-29\/BWBCK\/1351524240\/head\/\/0
> [] v 13'37 snapset=0=[]:[] snapc=0=[])",
>           "received_at": "2012-10-29 08:28:03.797576",
>           "age": "8348.393528",
>           "duration": "0.045426",
>           "flag_point": "started",
>           "events": [
>                 { "time": "2012-10-29 08:28:03.805648",
>                   "event": "waiting_for_osdmap"},
>                 { "time": "2012-10-29 08:28:03.806203",
>                   "event": "reached_pg"},
>                 { "time": "2012-10-29 08:28:03.806222",
>                   "event": "started"},
>                 { "time": "2012-10-29 08:28:03.806299",
>                   "event": "commit_queued_for_journal_write"},
>                 { "time": "2012-10-29 08:28:03.807905",
>                   "event": "write_thread_in_journal_buffer"},
>                 { "time": "2012-10-29 08:28:03.808154",
>                   "event": "journaled_completion_queued"},
>                 { "time": "2012-10-29 08:28:03.809422",
>                   "event": "sub_op_commit"},
>                 { "time": "2012-10-29 08:28:03.843002",
>                   "event": "sub_op_applied"}]}]}
>
> Restarting the OSD kills this request. Is this a bug, and, is there a
> way to stop a request without the OSD restart?
>
> Thanks,
>
> Ian
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux