IO Hang on rbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all!

We have an annoying problem - when we launch intensive reading with rbd, the client, to which mounted image, hangs in this state:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    1.20     0.00     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    1.20     0.00     0.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
rbd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00    32.00    0.00    0.00    0.00   0.00 100.00

Only  reboot helps. The logs are clean.

The fastest way to get hang it is run fio read with block size 512K, 4K  usually works fine. But client may hang without fio - only because of heavy load.

We used different versions of the linux kernel and ceph - now on OSD and MONS we use ceph 0.87-1 and linux kernel 3.18. On the clients we have tried the latest versions from here http://gitbuilder.ceph.com/. , for example Ceph  0.87-68. Through libvirt everything works fine - we also  use  KVM  and stgt (but stgs is slow)

Here is my config:
[global]
        fsid = 566d9cab-793e-47e0-a0cd-e5da09f8037a
        mon_initial_members = srt-mon-001-000002,amz-mon-001-000601,db24-mon-001-000105
        mon_host = 10.201.20.31,10.203.20.56,10.202.20.58
        auth_cluster_required = cephx
        auth_service_required = cephx
        auth_client_required = cephx
        filestore_xattr_use_omap = true
        public network = 10.201.20.0/22
        cluster network = 10.212.36.0/22
        osd crush update on start = false
[mon]
        debug mon = 0
        debug paxos = 0/0
        debug auth = 0

[mon.srt-mon-001-000002]
        host = srt-mon-001-000002
        mon addr = 10.201.20.31:6789
[mon.db24-mon-001-000105]
        host = db24-mon-001-000105
        mon addr = 10.202.20.58:6789
[mon.amz-mon-001-000601]
        host = amz-mon-001-000601
        mon addr = 10.203.20.56:6789
[osd]
        osd crush update on start = false
        osd mount options xfs = "rw,noatime,inode64,allocsize=4M"
        osd mkfs type = xfs
        osd mkfs options xfs = "-f -i size=2048"
        osd op threads = 20
        osd disk threads =8
        journal block align = true
        journal dio = true
        journal aio = true
        osd recovery max active = 1
        filestore max sync interval = 100
        filestore min sync interval = 10
        filestore queue max ops = 2000
        filestore queue max bytes = 536870912
        filestore queue committing max ops = 2000
        filestore queue committing max bytes = 536870912
        osd max backfills = 1
        osd client op priority = 63
[osd.5]
        host = srt-osd-001-050204
[osd.6]
        host = srt-osd-001-050204
[osd.7]
        host = srt-osd-001-050204
[osd.8]
        host = srt-osd-001-050204
[osd.109]
....
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux