It looks like it's just a ping message in that capture. Are you saying that you restarted OSD 46 and the problem persisted? On Tue, May 16, 2017 at 4:02 PM, Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote: > Hello, > > while reproducing the problem, objecter_requests looks like this: > > { > "ops": [ > { > "tid": 42029, > "pg": "5.bd9616ad", > "osd": 46, > "object_id": "rbd_data.e10ca56b8b4567.000000000000311c", > "object_locator": "@5", > "target_object_id": "rbd_data.e10ca56b8b4567.000000000000311c", > "target_object_locator": "@5", > "paused": 0, > "used_replica": 0, > "precalc_pgid": 0, > "last_sent": "2.28854e+06s", > "attempts": 1, > "snapid": "head", > "snap_context": "a07c2=[]", > "mtime": "2017-05-16 21:53:22.0.069541s", > "osd_ops": [ > "delete" > ] > } > ], > "linger_ops": [ > { > "linger_id": 1, > "pg": "5.5f3bd635", > "osd": 17, > "object_id": "rbd_header.e10ca56b8b4567", > "object_locator": "@5", > "target_object_id": "rbd_header.e10ca56b8b4567", > "target_object_locator": "@5", > "paused": 0, > "used_replica": 0, > "precalc_pgid": 0, > "snapid": "head", > "registered": "1" > } > ], > "pool_ops": [], > "pool_stat_ops": [], > "statfs_ops": [], > "command_ops": [] > } > > Yes they've an established TCP connection. Qemu <=> osd.46. Attached is > a pcap file of the traffic between them when it got stuck. > > Greets, > Stefan > > Am 16.05.2017 um 21:45 schrieb Jason Dillaman: >> On Tue, May 16, 2017 at 3:37 PM, Stefan Priebe - Profihost AG >> <s.priebe@xxxxxxxxxxxx> wrote: >>> We've enabled the op tracker for performance reasons while using SSD >>> only storage ;-( >> >> Disabled you mean? >> >>> Can enable the op tracker using ceph osd tell? Than reproduce the >>> problem. Check what has stucked again? Or should i generate an rbd log >>> from the client? >> >> From a super-quick glance at the code, it looks like that isn't a >> dynamic setting. Of course, it's possible that if you restart OSD 46 >> to enable the op tracker, the stuck op will clear itself and the VM >> will resume. You could attempt to generate a gcore of OSD 46 to see if >> information on that op could be extracted via the debugger, but no >> guarantees. >> >> You might want to verify that the stuck client and OSD 46 have an >> actual established TCP connection as well before doing any further >> actions. >> -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com