long blocking with writes on rbds

Jeff Epstein <jeff.epstein@xxxxxxxxxxxxxxxx> · Wed, 08 Apr 2015 12:24:21 -0400

Hi, I'm having sporadic very poor performance running ceph. Right now 
mkfs, even with nodiscard, takes 30 mintes or more. These kind of delays 
happen often but irregularly .There seems to be no common denominator. 
Clearly, however, they make it impossible to deploy ceph in production.

I reported this problem earlier on ceph's IRC, and was told to add 
nodiscard to mkfs. That didn't help. Here is the command that I'm using 
to format an rbd:

For example: mkfs.ext4 -text4 -m0 -b4096 -E nodiscard /dev/rbd1

Ceph says everything is okay:

    cluster e96e10d3-ad2b-467f-9fe4-ab5269b70206
     health HEALTH_OK
     monmap e1: 3 mons at 
{a=192.168.224.4:6789/0,b=192.168.232.4:6789/0,c=192.168.240.4:6789/0}, 
election epoch 12, quorum 0,1,2 a,b,c
     osdmap e972: 6 osds: 6 up, 6 in
      pgmap v4821: 4400 pgs, 44 pools, 5157 MB data, 1654 objects
            46138 MB used, 1459 GB / 1504 GB avail
                4400 active+clean

And here's my crush map:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5

# types
type 0 osd
type 1 district
type 2 region

# buckets
district district-1 {
    id -1        # do not change unnecessarily
    # weight 3.000
    alg straw
    hash 0    # rjenkins1
    item osd.1 weight 1.000
    item osd.2 weight 1.000
    item osd.5 weight 1.000
}
district district-2 {
    id -2        # do not change unnecessarily
    # weight 3.000
    alg straw
    hash 0    # rjenkins1
    item osd.0 weight 1.000
    item osd.3 weight 1.000
    item osd.4 weight 1.000
}
region ec2 {
    id -3        # do not change unnecessarily
    # weight 2.000
    alg straw
    hash 0    # rjenkins1
    item district-1 weight 1.000
    item district-2 weight 1.000
}

# rules
rule rule-district-1 {
    ruleset 0
    type replicated
    min_size 2
    max_size 3
    step take district-1
    step chooseleaf firstn 0 type osd
    step emit
}
rule rule-district-2 {
    ruleset 1
    type replicated
    min_size 2
    max_size 3
    step take district-2
    step chooseleaf firstn 0 type osd
    step emit
}

# end crush map

Does anyone have any insight into diagnosing this problem?

Jeff
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com