I'm still pretty new at Ceph so take this with a grain of salt.
- In our experience, we have tried SSD journals and bcache, we have had more stability and performance by just using SSD journals. We have created an SSD pool with the rest of the space and it did not perform much better than the spindles with SSD journals (Ceph is not yet optimized for SSD OSDs yet). From what I understand, SSD cache tiering is good if most of your disk I/O is reads and very little writes. This is due to that any writes causes a replication to all SSD OSDs with a minimum of 4MB blocks. In our environment with VMs and 30% reads and 70% writes to disk would not be beneficial. The best thing is to try it and see if it works for you.
- More OSDs may actually make the problem worse. There seems to be an direct relationship between CPU utilization and PG number. The higher the PG number, the higher the CPU utilization. Higher PGs will give you better data distribution across all your OSDs. In one case, a lower PG number gave us better performance because we didn't hit the top of our CPUs, but too low doesn't spread out the traffic to many OSDs. The target is about 100 PGs per OSD given the formula here http://ceph.com/docs/master/rados/operations/placement-groups/#choosing-the-number-of-placement-groups. Since we will have so many OSDs and will have a lot of free space, we may go lower in our PG count.
- I'm not really sure what is going on here. Some other details may help others such as version of Ceph running, how it was deployed (manual, Puppet, ceph-deploy), which file system is used on the OSDs, how many OSD nodes, how many OSDs and what types of disks, SSD journals or colocated, what is the hardware set-up, output from `ceph status` and `ceph osd tree` along with some of the relevant logs from the problematic OSD.
On Mon, Oct 20, 2014 at 3:07 AM, Leszek Master <keksior@xxxxxxxxx> wrote:
Best regards !Please help me.3) When i was adding my last 6 drives to a cluster i've noticed that the recovery speed had gone from 500-1000MB/s to 10-50 MB/s. When i restarted the osd that i was adding the transfers got back to normal. Also i've noticed that when i then do rados benchmark i've got dropping transfers to 0 MB/s even few times a row. The restarting osdes that i was adding or one by one that was already in cluster solved the problem. What can it be ? In the logs there isn't anything weird. The whole cluster stucks till i restart or even recreate journals on them. How to solve this ?1) If i want to use cache tier should i use it with ssd journaling or i can get better perfomance using more ssd GB for cache tier?2) I've got cluster made of 26x900GB SAS disk with ssd journaling. The placement groups i've got is 1024. When i add new osd to cluster, my VMs get io errors and got stuck even if i had osd_max_backfills set to 1. If i change pgs from 1024 to 4096 would it get less affected by backfilling and recovery?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com