We tried default configuration without additional parameters, but it still hangs How can we see a OSD queue? 15.12.2014, 16:11, "Tomasz Kuzemko" <tomasz.kuzemko@xxxxxxx>: > Try lowering "filestore max sync interval" and "filestore min sync > interval". It looks like during the hanged period data is flushed from > some overly big buffer. > > If this does not help you can monitor perf stats on OSDs to see if some > queue is unusually large. > > -- > Tomasz Kuzemko > tomasz.kuzemko@xxxxxxx > > On Thu, Dec 11, 2014 at 07:57:48PM +0300, reistlin87 wrote: >> Hi all! >> >> We have an annoying problem - when we launch intensive reading with rbd, the client, to which mounted image, hangs in this state: >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >> sda 0.00 0.00 0.00 1.20 0.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00 >> dm-0 0.00 0.00 0.00 1.20 0.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00 >> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> rbd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 32.00 0.00 0.00 0.00 0.00 100.00 >> >> Only reboot helps. The logs are clean. >> >> The fastest way to get hang it is run fio read with block size 512K, 4K usually works fine. But client may hang without fio - only because of heavy load. >> >> We used different versions of the linux kernel and ceph - now on OSD and MONS we use ceph 0.87-1 and linux kernel 3.18. On the clients we have tried the latest versions from here http://gitbuilder.ceph.com/. , for example Ceph 0.87-68. Through libvirt everything works fine - we also use KVM and stgt (but stgs is slow) >> >> Here is my config: >> [global] >> fsid = 566d9cab-793e-47e0-a0cd-e5da09f8037a >> mon_initial_members = srt-mon-001-000002,amz-mon-001-000601,db24-mon-001-000105 >> mon_host = 10.201.20.31,10.203.20.56,10.202.20.58 >> auth_cluster_required = cephx >> auth_service_required = cephx >> auth_client_required = cephx >> filestore_xattr_use_omap = true >> public network = 10.201.20.0/22 >> cluster network = 10.212.36.0/22 >> osd crush update on start = false >> [mon] >> debug mon = 0 >> debug paxos = 0/0 >> debug auth = 0 >> >> [mon.srt-mon-001-000002] >> host = srt-mon-001-000002 >> mon addr = 10.201.20.31:6789 >> [mon.db24-mon-001-000105] >> host = db24-mon-001-000105 >> mon addr = 10.202.20.58:6789 >> [mon.amz-mon-001-000601] >> host = amz-mon-001-000601 >> mon addr = 10.203.20.56:6789 >> [osd] >> osd crush update on start = false >> osd mount options xfs = "rw,noatime,inode64,allocsize=4M" >> osd mkfs type = xfs >> osd mkfs options xfs = "-f -i size=2048" >> osd op threads = 20 >> osd disk threads =8 >> journal block align = true >> journal dio = true >> journal aio = true >> osd recovery max active = 1 >> filestore max sync interval = 100 >> filestore min sync interval = 10 >> filestore queue max ops = 2000 >> filestore queue max bytes = 536870912 >> filestore queue committing max ops = 2000 >> filestore queue committing max bytes = 536870912 >> osd max backfills = 1 >> osd client op priority = 63 >> [osd.5] >> host = srt-osd-001-050204 >> [osd.6] >> host = srt-osd-001-050204 >> [osd.7] >> host = srt-osd-001-050204 >> [osd.8] >> host = srt-osd-001-050204 >> [osd.109] >> .... >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com