On 04/24/17 22:23, Phil Lacroute wrote:
Jason,
Thanks for the suggestion. That seems to show it is
not the OSD that got stuck:
ceph7:~$
sudo rbd -c debug/ceph.conf info app/image1
…
2017-04-24
13:13:49.761076 7f739aefc700 1 -- 192.168.206.17:0/1250293899
--> 192.168.206.13:6804/22934 -- osd_op(client.4384.0:3
1.af6f1e38 rbd_header.1058238e1f29 [call rbd.get_size,call
rbd.get_object_prefix] snapc 0=[] ack+read+known_if_redirected
e27) v7 -- ?+0 0x7f737c0077f0 con 0x7f737c0064e0
…
2017-04-24
13:14:04.756328 7f73a2880700 1 -- 192.168.206.17:0/1250293899
--> 192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0
0x7f7374000fc0 con 0x7f737c0064e0
ceph0:~$
sudo ceph pg map 1.af6f1e38
osdmap
e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2]
ceph3:~$ sudo ceph daemon osd.11 ops
{
"ops": [],
"num_ops": 0
}
I repeated this a few
times and it’s always the same command and same
placement group that hangs, but OSD11 has no ops (and
neither do OSD16 and OSD2, although I think that’s
expected).
Is there other tracing I should do on the OSD
or something more to look at on the client?
Thanks,
Phil
Does it still happen if you disable exclusive-lock, or maybe
separately fast-diff and object-map?
I have a similar problem where VMs with those 3 features hang and
need kill -9, and without them, they never hang.
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com