Hello Jason, Am 16.05.2017 um 21:32 schrieb Jason Dillaman: > Thanks for the update. In the ops dump provided, the objecter is > saying that OSD 46 hasn't responded to the deletion request of object > rbd_data.e10ca56b8b4567.000000000000311c. > > Perhaps run "ceph daemon osd.46 dump_ops_in_flight" or "... > dump_historic_ops" to see if that op is in the list? We've enabled the op tracker for performance reasons while using SSD only storage ;-( Can enable the op tracker using ceph osd tell? Than reproduce the problem. Check what has stucked again? Or should i generate an rbd log from the client? > You can also run > "ceph osd map <pool name> rbd_data.e10ca56b8b4567.000000000000311c" to > verify that OSD 46 is the primary PG for that object. Yes it is: osdmap e886758 pool 'cephstor1' (5) object 'rbd_data.e10ca56b8b4567.000000000000311c' -> pg 5.bd9616ad (5.6ad) -> up ([46,29,30], p46) acting ([46,29,30], p46) Greets, Stefan > On Tue, May 16, 2017 at 3:14 PM, Stefan Priebe - Profihost AG > <s.priebe@xxxxxxxxxxxx> wrote: >> Hello Jason, >> >> i'm happy to tell you that i've currently one VM where i can reproduce >> the problem. >> >>> The best option would be to run "gcore" against the running VM whose >>> IO is stuck, compress the dump, and use the "ceph-post-file" to >>> provide the dump. I could then look at all the Ceph data structures to >>> hopefully find the issue. >> >> I've saved the dump but it will contain sensitive informations. I won't >> upload it to a public server. I'll send you an private email with a >> private server to download the core dump. Thanks! >> >>> Enabling debug logs after the IO has stuck will most likely be of >>> little value since it won't include the details of which IOs are >>> outstanding. You could attempt to use "ceph --admin-daemon >>> /path/to/stuck/vm/asok objecter_requests" to see if any IOs are just >>> stuck waiting on an OSD to respond. >> >> This is the output: >> # ceph --admin-daemon >> /var/run/ceph/ceph-client.admin.5295.140214539927552.asok objecter_requests >> { >> "ops": [ >> { >> "tid": 384632, >> "pg": "5.bd9616ad", >> "osd": 46, >> "object_id": "rbd_data.e10ca56b8b4567.000000000000311c", >> "object_locator": "@5", >> "target_object_id": "rbd_data.e10ca56b8b4567.000000000000311c", >> "target_object_locator": "@5", >> "paused": 0, >> "used_replica": 0, >> "precalc_pgid": 0, >> "last_sent": "2.28554e+06s", >> "attempts": 1, >> "snapid": "head", >> "snap_context": "a07c2=[]", >> "mtime": "2017-05-16 21:03:22.0.196102s", >> "osd_ops": [ >> "delete" >> ] >> } >> ], >> "linger_ops": [ >> { >> "linger_id": 1, >> "pg": "5.5f3bd635", >> "osd": 17, >> "object_id": "rbd_header.e10ca56b8b4567", >> "object_locator": "@5", >> "target_object_id": "rbd_header.e10ca56b8b4567", >> "target_object_locator": "@5", >> "paused": 0, >> "used_replica": 0, >> "precalc_pgid": 0, >> "snapid": "head", >> "registered": "1" >> } >> ], >> "pool_ops": [], >> "pool_stat_ops": [], >> "statfs_ops": [], >> "command_ops": [] >> } >> >> Greets, >> Stefan >> >> Am 16.05.2017 um 15:44 schrieb Jason Dillaman: >>> On Tue, May 16, 2017 at 2:12 AM, Stefan Priebe - Profihost AG >>> <s.priebe@xxxxxxxxxxxx> wrote: >>>> 3.) it still happens on pre jewel images even when they got restarted / >>>> killed and reinitialized. In that case they've the asok socket available >>>> for now. Should i issue any command to the socket to get log out of the >>>> hanging vm? Qemu is still responding just ceph / disk i/O gets stalled. >>> >>> The best option would be to run "gcore" against the running VM whose >>> IO is stuck, compress the dump, and use the "ceph-post-file" to >>> provide the dump. I could then look at all the Ceph data structures to >>> hopefully find the issue. >>> >>> Enabling debug logs after the IO has stuck will most likely be of >>> little value since it won't include the details of which IOs are >>> outstanding. You could attempt to use "ceph --admin-daemon >>> /path/to/stuck/vm/asok objecter_requests" to see if any IOs are just >>> stuck waiting on an OSD to respond. >>> > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com