Re: Understanding the output of dump_historic_ops

ceph@xxxxxxxxxx · Sun, 02 Sep 2018 20:02:04 +0200

Hi Ronni,

Am 2. September 2018 13:32:05 MESZ schrieb Ronnie Lazar <ronnie@xxxxxxxxxxxxxxx>:
>Hello,
>
>I'm trying to understand the output of the dump_historic_ops admin sock
>command.
>I can't find information on what are the meaning of the different
>states
>that an OP can be in.
>For example, in the following excerpt:
>        {
>
>
>
>            "description": "MOSDPGPush(1.a5 421/239
>[PushOp(1:a534ca1b:::rbd_data.cb0f2fd796ae.000000000000317e:head,
>version:
>230'121959, data_included:
>[20480~4096,40960~4096,102400~4096,110592~4096,122880~4096,147456~4096,266240~4096,274432~4096,434176~4096,593920~4096,65
>5360~4096,774144~4096,1019904~4096,1114112~4096,1134592~4096,1142784~4096,1204224~4096,1323008~4096,1339392~4096,1359872~4096,1445888~4096,1454080~4096,1617920~4096,1712128~8192,1757184~4096,1953792~4096,1978368~4096,2134016~4096,2314240~4096,2650112~4096,2662400~4096,267878
>4~4096,2686976~4096,2744320~4096,2760704~4096,2875392~4096,2945024~4096,3330048~4096,3444736~8192,3493888~4096,3502080~4096,3522560~4096,3608576~4096,3743744~4096,3805184~4096,3915776~4096,4079616~4096,4096000~4096,4112384~4096,4128768~4096],
>data_size: 212992, omap_header_s
>ize: 0, omap_entries_size: 0, attrset_size: 2, recovery_info:
>ObjectRecoveryInfo(1:a534ca1b:::rbd_data.cb0f2fd796ae.000000000000317e:head@230'121959,
>size: 4132864, copy_subset: [0~4132864], clone_subset: {}, snapset:
>0=[]:[]), after_progress: ObjectRecoveryProgress(!first,
>data_recovered_to:4132864, data_complete:true, omap_recovered_to:,
>omap_complete:true, error:false), before_progress:
>ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false,
>omap_recovered_to:, omap_complete:false, error:false))])",
>            "initiated_at": "2018-09-02 11:20:32.486670",
>            "age": 594.163684,
>            "duration": 2.162485,
>            "type_data": {
>                "flag_point": "started",
>                "events": [
>                    {
>                        "time": "2018-09-02 11:20:32.486670",
>                        "event": "initiated"
>                    },
>                    {
>                        "time": "2018-09-02 11:20:32.487195",
>                        "event": "queued_for_pg"
>                    },

I guess in this Case this is where you could have a look.

A Job was queued but it took nearly 2 sec. Till it could be handled from the PG. Why? ... Hmm..  perhaps a busy Disk or CPU? 

>                    {
>                        "time": "2018-09-02 11:20:34.648092",
>                        "event": "reached_pg"
>                    },

Hth
- Mehmet

>                    {
>                        "time": "2018-09-02 11:20:34.648095",
>                        "event": "started"
>                    },
>                    {
>                        "time": "2018-09-02 11:20:34.649155",
>                        "event": "done"
>                    }
>                ]
>            }
>        },
>
>Seems like I have an operation that was delayed over 2 seconds in
>queued_for_pg state.
>What does that mean? What was it waiting for?
>
>Regards,
>*Ronnie Lazar*
>*R&D*
>
>T: +972 77 556-1727
>E: ronnie@xxxxxxxxxxxxxxx
>
>
>Web <http://www.stratoscale.com/> | Blog
><http://www.stratoscale.com/blog/>
> | Twitter <https://twitter.com/Stratoscale> | Google+
><https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
> | Linkedin <https://www.linkedin.com/company/stratoscale>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com