Performance issues related to scrubbing

Cullen King <cullen@xxxxxxxxxxxxxxx> · Wed, 3 Feb 2016 17:48:02 -0800

Hello,

I've been trying to nail down a nasty performance issue related to scrubbing. I am mostly using radosgw with a handful of buckets containing millions of various sized objects. When ceph scrubs, both regular and deep, radosgw blocks on external requests, and my cluster has a bunch of requests that have blocked for > 32 seconds. Frequently OSDs are marked down.

According to atop, the OSDs being deep scrubbed are reading at only 5mb/s to 8mb/s, and a scrub of a 6.4gb placement group takes 10-20 minutes.
Here's a screenshot of atop from a node: https://s3.amazonaws.com/rwgps/screenshots/DgSSRyeF.png

First question: is this a reasonable speed for scrubbing, given a very lightly used cluster? Here's some cluster details:

deploy@drexler:~$ ceph --version
ceph version 0.94.1-5-g85a68f9 (85a68f9a8237f7e74f44a1d1fbbd6cb4ac50f8e8)

2x Xeon E5-2630 per node, 64gb of ram per node.

deploy@drexler:~$ ceph status
    cluster 234c6825-0e2b-4256-a710-71d29f4f023e
     health HEALTH_WARN
            118 requests are blocked > 32 sec
     monmap e1: 3 mons at {drexler=10.0.0.36:6789/0,lucy=10.0.0.38:6789/0,paley=10.0.0.34:6789/0}
            election epoch 296, quorum 0,1,2 paley,drexler,lucy
     mdsmap e19989: 1/1/1 up {0=lucy=up:active}, 1 up:standby
     osdmap e1115: 12 osds: 12 up, 12 in
      pgmap v21748062: 1424 pgs, 17 pools, 3185 GB data, 20493 kobjects
            10060 GB used, 34629 GB / 44690 GB avail
                1422 active+clean
                   1 active+clean+scrubbing+deep
                   1 active+clean+scrubbing
  client io 721 kB/s rd, 33398 B/s wr, 53 op/s

deploy@drexler:~$ ceph osd tree
ID WEIGHT   TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 43.67999 root default                                      
-2 14.56000     host paley                                    
 0  3.64000         osd.0         up  1.00000          1.00000
 3  3.64000         osd.3         up  1.00000          1.00000
 6  3.64000         osd.6         up  1.00000          1.00000
 9  3.64000         osd.9         up  1.00000          1.00000
-3 14.56000     host lucy                                      
 1  3.64000         osd.1         up  1.00000          1.00000
 4  3.64000         osd.4         up  1.00000          1.00000
 7  3.64000         osd.7         up  1.00000          1.00000
11  3.64000         osd.11        up  1.00000          1.00000
-4 14.56000     host drexler                                  
 2  3.64000         osd.2         up  1.00000          1.00000
 5  3.64000         osd.5         up  1.00000          1.00000
 8  3.64000         osd.8         up  1.00000          1.00000
10  3.64000         osd.10        up  1.00000          1.00000

My OSDs are 4tb 7200rpm Hitachi DeskStars, using XFS, with Samsung 850 Pro journals (very slow, ordered s3700 replacements, but shouldn't pose problems for reading as far as I understand things). MONs are co-located with OSD nodes, but the nodes are fairly beefy and has very low load. Drives are on a expanding backplane, with an LSI SAS3008 controller.

I have a fairly standard config as well:

https://gist.github.com/kingcu/aae7373eb62ceb7579da

I know that I don't have a ton of OSDs, but I'd expect a little better performance than this. Checkout munin of my three nodes:

http://munin.ridewithgps.com/ridewithgps.com/drexler.ridewithgps.com/index.html#disk
http://munin.ridewithgps.com/ridewithgps.com/paley.ridewithgps.com/index.html#disk
http://munin.ridewithgps.com/ridewithgps.com/lucy.ridewithgps.com/index.html#disk

Any input would be appreciated, before I start trying to micro-optimize config params, as well as upgrading to Infernalis.

Cheers,

Cullen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com