Hi ceph-users and ceph-devel, Merry Christmas and Happy New Year! We have a ceph cluster with radosgw, our customer is using S3 API to access the cluster. The basic information of the cluster is: bash-4.1$ ceph -s cluster b9cb3ea9-e1de-48b4-9e86-6921e2c537d2 health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors monmap e1: 3 mons at {osd151=10.194.0.68:6789/0,osd152=10.193.207.130:6789/0,osd153=10.193.207.131:6789/0}, election epoch 40, quorum 0,1,2 osd151,osd152,osd153 osdmap e129885: 787 osds: 758 up, 758 in pgmap v1884502: 22203 pgs: 22125 active+clean, 1 active+clean+scrubbing, 1 active+clean+inconsistent, 76 active+clean+scrubbing+deep; 96319 GB data, 302 TB used, 762 TB / 1065 TB avail mdsmap e1: 0/0/1 up #When the latency peak happened, there was no scrubbing, recovering or backfilling at the moment.# While the performance of the cluster (only with WRITE traffic) is stable until Dec 25th, our monitoring (for radosgw access log) shows a significant increase of average latency and 99% latency. And then I chose one OSD and try to grep slow requests logs and find that most of the slow requests were waiting for subop, I take osd22 for example. osd[561-571] are hosted by osd22. -bash-4.1$ for i in {561..571}; do grep "slow request" ceph-osd.$i.log | grep "2013-12-25 16"| grep osd_op | grep -oP "\d+,\d+" ; done > ~/slow_osd.txt -bash-4.1$ cat ~/slow_osd.txt | sort | uniq -c | sort –nr 3586 656,598 289 467,629 284 598,763 279 584,598 203 172,598 182 598,6 155 629,646 83 631,598 65 631,593 21 616,629 20 609,671 20 609,390 13 609,254 12 702,629 12 629,641 11 665,613 11 593,724 11 361,591 10 591,709 9 681,609 9 609,595 9 591,772 8 613,662 8 575,591 7 674,722 7 609,603 6 585,605 5 613,691 5 293,629 4 774,591 4 717,591 4 613,776 4 538,629 4 485,629 3 702,641 3 608,629 3 593,580 3 591,676 It turns out most of the slow requests were waiting for osd 598, 629, I ran the procedure on another host osd22 and got the same pattern. Then I turned to the host having osd598 and dump the perf counter to do comparision. -bash-4.1$ for i in {594..604}; do sudo ceph --admin-daemon /var/run/ceph/ceph-osd.$i.asok perf dump | ~/do_calc_op_latency.pl; done op_latency,subop_latency,total_ops 0.192097526753471,0.0344513450167198,7549045 1.99137797628122,1.42198426157216,9184472 0.198062399664129,0.0387090378926376,6305973 0.621697271315762,0.396549768986993,9726679 29.5222496247375,18.2888846379615, 10860858 0.229250239525916,0.0557482067611005,8149691 0.208981698303654,0.0375553180438224,6623842 0.47474766302086,0.292583928601509,9838777 0.339477790083925,0.101288409388438,9340212 0.186448840141895,0.0327296517417626,7081410 0.807598201207144,0.0139762289702332,6093531 (osd 598 is op hotspot as well) This double confirmed that osd 598 was having some performance issues (it has around 30 seconds average op latency!). sar shows slightly higher disk I/O for osd 598 (/dev/sdf) but the latency difference is not as significant as we saw from osd perf. reads kbread writes kbwrite %busy avgqu await svctm 37.3 459.9 89.8 4106.9 61.8 1.6 12.2 4.9 42.3 545.8 91.8 4296.3 69.7 2.4 17.6 5.2 42.0 483.8 93.1 4263.6 68.8 1.8 13.3 5.1 39.7 425.5 89.4 4327.0 68.5 1.8 14.0 5.3 Another disk at the same time for comparison (/dev/sdb). reads kbread writes kbwrite %busy avgqu await svctm 34.2 502.6 80.1 3524.3 53.4 1.3 11.8 4.7 35.3 560.9 83.7 3742.0 56.0 1.2 9.8 4.7 30.4 371.5 78.8 3631.4 52.2 1.7 15.8 4.8 33.0 389.4 78.8 3597.6 54.2 1.4 12.1 4.8 Any idea why a couple of OSDs are so slow that impact the performance of the entire cluster? Thanks, Guang |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com