I found place to paste my output for `ceph daemon osd.xx config show` for all my OSD's:
If you want it in a gzip'd txt file, you can download here:
dstat -cd --disk-util -D sda,sdb,sdc,sdd,sde,sdf,sdg,sdh --disk-tps
I dont have any client load on my cluster at this point to show any good output but with just '11 active+clean+scrubbing+deep' being run, I am seeing 70-80% disk utilization for each OSD according to dstat.
On Thu, Sep 3, 2015 at 2:34 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
Can you post the output otceph daemon osd.xx config show? (probably as an attachment).There are several things that I've seen cause it1) too many PGs but too little degraded objects make it seem "slow" (if you just have 2 degraded objects but restarted a host with 10K PGs, it will have to scan all the PGs probably)2) sometimes the process gets stuck when a toofull condition occurs3) sometimes the process gets stuck for no apparent reason - restarting the currently backfilling/recovering OSDs fixes itsetting osd_recovery_threads sometimes fixes both 2) and 3), but usually not4) setting recovery_delay_start to anything > 0 makes recovery slow (even 0.0000001 makes it much slower than simple 0). On the other hand we had to set it high as a default because of slow ops when restarting OSDs, which was partially fixed by this.Can you see any bottleneck in the system? CPU spinning, disks reading? I don't think this is the issue, just make sure it's not something more obvious...JanOn 02 Sep 2015, at 22:34, Bob Ababurko <bob@xxxxxxxxxxxx> wrote:_______________________________________________When I lose a disk OR replace a OSD in my POC ceph cluster, it takes a very long time to rebalance. I should note that my cluster is slightly unique in that I am using cephfs(shouldn't matter?) and it currently contains about 310 million objects.The last time I replaced a disk/OSD was 2.5 days ago and it is still rebalancing. This is on a cluster with no client load.The configurations is 5 hosts with 6 x 1TB 7200rpm SATA OSD's & 1 850 Pro SSD which contains the journals for said OSD's. Thats means 30 OSD's in total. System disk is on its own disk. I'm also using a backend network with single Gb NIC. THe rebalancing rate(objects/s) seems to be very slow when it is close to finishing....say <1% objects misplaced.It doesn't seem right that it would take 2+ days to rebalance a 1TB disk with no load on the cluster. Are my expectations off?I'm not sure if my pg_num/pgp_num needs to be changed OR the rebalance time is dependent on the number of objects in the pool. These are thoughts i've had but am not certain are relevant here.$ sudo ceph -vceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)$ sudo ceph -s[sudo] password for bababurko:cluster f25cb23f-2293-4682-bad2-4b0d8ad10e79health HEALTH_WARN5 pgs backfilling5 pgs stuck uncleanrecovery 3046506/676638611 objects misplaced (0.450%)monmap e1: 3 mons at {cephmon01=10.15.24.71:6789/0,cephmon02=10.15.24.80:6789/0,cephmon03=10.15.24.135:6789/0}election epoch 20, quorum 0,1,2 cephmon01,cephmon02,cephmon03mdsmap e6070: 1/1/1 up {0=cephmds01=up:active}, 1 up:standbyosdmap e4395: 30 osds: 30 up, 30 in; 5 remapped pgspgmap v3100039: 2112 pgs, 3 pools, 6454 GB data, 321 Mobjects18319 GB used, 9612 GB / 27931 GB avail3046506/676638611 objects misplaced (0.450%)2095 active+clean12 active+clean+scrubbing+deep5 active+remapped+backfillingrecovery io 2294 kB/s, 147 objects/s$ sudo rados dfpool name KB objects clones degraded unfound rd rd KB wr wr KBcephfs_data 6767569962 335746702 0 0 0 2136834 1 676984208 7052266742cephfs_metadata 42738 1058437 0 0 0 16130199 30718800215 295996938 3811963908rbd 0 0 0 0 0 0 0 0 0total used 19209068780 336805139total avail 10079469460total space 29288538240$ sudo ceph osd pool get cephfs_data pgp_numpg_num: 1024$ sudo ceph osd pool get cephfs_metadata pgp_numpg_num: 1024thanks,Bob
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com