On Tue, May 16, 2017 at 2:12 AM, Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote: > 3.) it still happens on pre jewel images even when they got restarted / > killed and reinitialized. In that case they've the asok socket available > for now. Should i issue any command to the socket to get log out of the > hanging vm? Qemu is still responding just ceph / disk i/O gets stalled. The best option would be to run "gcore" against the running VM whose IO is stuck, compress the dump, and use the "ceph-post-file" to provide the dump. I could then look at all the Ceph data structures to hopefully find the issue. Enabling debug logs after the IO has stuck will most likely be of little value since it won't include the details of which IOs are outstanding. You could attempt to use "ceph --admin-daemon /path/to/stuck/vm/asok objecter_requests" to see if any IOs are just stuck waiting on an OSD to respond. -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com