Hello,
I've been trying to nail down a nasty performance issue related to scrubbing. I am mostly using radosgw with a handful of buckets containing millions of various sized objects. When ceph scrubs, both regular and deep, radosgw blocks on external requests, and my cluster has a bunch of requests that have blocked for > 32 seconds. Frequently OSDs are marked down.
According to atop, the OSDs being deep scrubbed are reading at only 5mb/s to 8mb/s, and a scrub of a 6.4gb placement group takes 10-20 minutes.
First question: is this a reasonable speed for scrubbing, given a very lightly used cluster? Here's some cluster details:
deploy@drexler:~$ ceph --version
ceph version 0.94.1-5-g85a68f9 (85a68f9a8237f7e74f44a1d1fbbd6cb4ac50f8e8)
2x Xeon E5-2630 per node, 64gb of ram per node.
deploy@drexler:~$ ceph status
cluster 234c6825-0e2b-4256-a710-71d29f4f023e
health HEALTH_WARN
118 requests are blocked > 32 sec
monmap e1: 3 mons at {drexler=10.0.0.36:6789/0,lucy=10.0.0.38:6789/0,paley=10.0.0.34:6789/0}
election epoch 296, quorum 0,1,2 paley,drexler,lucy
mdsmap e19989: 1/1/1 up {0=lucy=up:active}, 1 up:standby
osdmap e1115: 12 osds: 12 up, 12 in
pgmap v21748062: 1424 pgs, 17 pools, 3185 GB data, 20493 kobjects
10060 GB used, 34629 GB / 44690 GB avail
1422 active+clean
1 active+clean+scrubbing+deep
1 active+clean+scrubbing
client io 721 kB/s rd, 33398 B/s wr, 53 op/s
deploy@drexler:~$ ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 43.67999 root default
-2 14.56000 host paley
0 3.64000 osd.0 up 1.00000 1.00000
3 3.64000 osd.3 up 1.00000 1.00000
6 3.64000 osd.6 up 1.00000 1.00000
9 3.64000 osd.9 up 1.00000 1.00000
-3 14.56000 host lucy
1 3.64000 osd.1 up 1.00000 1.00000
4 3.64000 osd.4 up 1.00000 1.00000
7 3.64000 osd.7 up 1.00000 1.00000
11 3.64000 osd.11 up 1.00000 1.00000
-4 14.56000 host drexler
2 3.64000 osd.2 up 1.00000 1.00000
5 3.64000 osd.5 up 1.00000 1.00000
8 3.64000 osd.8 up 1.00000 1.00000
10 3.64000 osd.10 up 1.00000 1.00000
My OSDs are 4tb 7200rpm Hitachi DeskStars, using XFS, with Samsung 850 Pro journals (very slow, ordered s3700 replacements, but shouldn't pose problems for reading as far as I understand things). MONs are co-located with OSD nodes, but the nodes are fairly beefy and has very low load. Drives are on a expanding backplane, with an LSI SAS3008 controller.
I have a fairly standard config as well:
https://gist.github.com/kingcu/aae7373eb62ceb7579da
I know that I don't have a ton of OSDs, but I'd expect a little better performance than this. Checkout munin of my three nodes:
http://munin.ridewithgps.com/ridewithgps.com/drexler.ridewithgps.com/index.html#disk
http://munin.ridewithgps.com/ridewithgps.com/paley.ridewithgps.com/index.html#disk
http://munin.ridewithgps.com/ridewithgps.com/lucy.ridewithgps.com/index.html#disk
Any input would be appreciated, before I start trying to micro-optimize config params, as well as upgrading to Infernalis.
I've been trying to nail down a nasty performance issue related to scrubbing. I am mostly using radosgw with a handful of buckets containing millions of various sized objects. When ceph scrubs, both regular and deep, radosgw blocks on external requests, and my cluster has a bunch of requests that have blocked for > 32 seconds. Frequently OSDs are marked down.
According to atop, the OSDs being deep scrubbed are reading at only 5mb/s to 8mb/s, and a scrub of a 6.4gb placement group takes 10-20 minutes.
Here's a screenshot of atop from a node: https://s3.amazonaws.com/rwgps/screenshots/DgSSRyeF.png
First question: is this a reasonable speed for scrubbing, given a very lightly used cluster? Here's some cluster details:
deploy@drexler:~$ ceph --version
ceph version 0.94.1-5-g85a68f9 (85a68f9a8237f7e74f44a1d1fbbd6cb4ac50f8e8)
2x Xeon E5-2630 per node, 64gb of ram per node.
deploy@drexler:~$ ceph status
cluster 234c6825-0e2b-4256-a710-71d29f4f023e
health HEALTH_WARN
118 requests are blocked > 32 sec
monmap e1: 3 mons at {drexler=10.0.0.36:6789/0,lucy=10.0.0.38:6789/0,paley=10.0.0.34:6789/0}
election epoch 296, quorum 0,1,2 paley,drexler,lucy
mdsmap e19989: 1/1/1 up {0=lucy=up:active}, 1 up:standby
osdmap e1115: 12 osds: 12 up, 12 in
pgmap v21748062: 1424 pgs, 17 pools, 3185 GB data, 20493 kobjects
10060 GB used, 34629 GB / 44690 GB avail
1422 active+clean
1 active+clean+scrubbing+deep
1 active+clean+scrubbing
client io 721 kB/s rd, 33398 B/s wr, 53 op/s
deploy@drexler:~$ ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 43.67999 root default
-2 14.56000 host paley
0 3.64000 osd.0 up 1.00000 1.00000
3 3.64000 osd.3 up 1.00000 1.00000
6 3.64000 osd.6 up 1.00000 1.00000
9 3.64000 osd.9 up 1.00000 1.00000
-3 14.56000 host lucy
1 3.64000 osd.1 up 1.00000 1.00000
4 3.64000 osd.4 up 1.00000 1.00000
7 3.64000 osd.7 up 1.00000 1.00000
11 3.64000 osd.11 up 1.00000 1.00000
-4 14.56000 host drexler
2 3.64000 osd.2 up 1.00000 1.00000
5 3.64000 osd.5 up 1.00000 1.00000
8 3.64000 osd.8 up 1.00000 1.00000
10 3.64000 osd.10 up 1.00000 1.00000
My OSDs are 4tb 7200rpm Hitachi DeskStars, using XFS, with Samsung 850 Pro journals (very slow, ordered s3700 replacements, but shouldn't pose problems for reading as far as I understand things). MONs are co-located with OSD nodes, but the nodes are fairly beefy and has very low load. Drives are on a expanding backplane, with an LSI SAS3008 controller.
I have a fairly standard config as well:
https://gist.github.com/kingcu/aae7373eb62ceb7579da
I know that I don't have a ton of OSDs, but I'd expect a little better performance than this. Checkout munin of my three nodes:
http://munin.ridewithgps.com/ridewithgps.com/drexler.ridewithgps.com/index.html#disk
http://munin.ridewithgps.com/ridewithgps.com/paley.ridewithgps.com/index.html#disk
http://munin.ridewithgps.com/ridewithgps.com/lucy.ridewithgps.com/index.html#disk
Any input would be appreciated, before I start trying to micro-optimize config params, as well as upgrading to Infernalis.
Cheers,
Cullen
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com