Re: Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adding more nodes is best if you have unlimited budget :)
You should add more osds per node until you start hitting cpu or network bottlenecks. Use a perf tool like atop/sysstat to know when this happens.




-------- Original message --------
From: kevin parrikar <kevin.parker092@xxxxxxxxx>
Date: 07/01/2017 19:56 (GMT+02:00)
To: Lionel Bouton <lionel-subscription@xxxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

Wow thats a lot of good information. I wish i knew about all these before investing on all these devices.Since i dont have any other option,will get better SSD and faster HDD .
I have one more generic question about Ceph.
To increase the throughput of a cluster what is the standard practice is it more osd "per" node or more osd "nodes".

Thanks alot for all your help.Learned so many new things thanks again

Kevin

On Sat, Jan 7, 2017 at 7:33 PM, Lionel Bouton <lionel-subscription@xxxxxxxxxxx> wrote:
Le 07/01/2017 à 14:11, kevin parrikar a écrit :
Thanks for your valuable input.
We were using these SSD in our NAS box(synology)  and it was giving 13k iops for our fileserver in raid1.We had a few spare disks which we added to our ceph nodes hoping that it will give good performance same as that of NAS box.(i am not comparing NAS with ceph ,just the reason why we decided to use these SSD)

We dont have S3520 or S3610 at the moment but can order one of these to see how it performs in ceph .We have 4xS3500  80Gb handy.
If i create a 2 node cluster with 2xS3500 each and with replica of 2,do you think it can deliver 24MB/s of 4k writes .

Probably not. See http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

According to the page above the DC S3500 reaches 39MB/s. Its capacity isn't specified, yours are 80GB only which is the lowest capacity I'm aware of and for all DC models I know of the speed goes down with the capacity so you probably will get lower than that.
If you put both data and journal on the same device you cut your bandwidth in half : so this would give you an average <20MB/s per OSD (with occasional peaks above that if you don't have a sustained 20MB/s). With 4 OSDs and size=2, your total write bandwidth is <40MB/s. For a single stream of data you will only get <20MB/s though (you won't benefit from parallel writes to the 4 OSDs and will only write on 2 at a time).

Not that by comparison the 250GB 840 EVO only reaches 1.9MB/s.

But even if you reach the 40MB/s, these models are not designed for heavy writes, you will probably kill them long before their warranty is expired (IIRC these are rated for ~24GB writes per day over the warranty period). In your configuration you only have to write 24G each day (as you have 4 of them, write both to data and journal and size=2) to be in this situation (this is an average of only 0.28 MB/s compared to your 24 MB/s target).

We bought S3500 because last time when we tried ceph, people were suggesting this model :) :)

The 3500 series might be enough with the higher capacities in some rare cases but the 80GB model is almost useless.

You have to do the math considering :
- how much you will write to the cluster (guess high if you have to guess),
- if you will use the SSD for both journals and data (which means writing twice on them),
- your replication level (which means you will write multiple times the same data),
- when you expect to replace the hardware,
- the amount of writes per day they support under warranty (if the manufacturer doesn't present this number prominently they probably are trying to sell you a fast car headed for a brick wall)

If your hardware can't handle the amount of write you expect to put in it then you are screwed. There were reports of new Ceph users not aware of this and using cheap SSDs that failed in a matter of months all at the same time. You definitely don't want to be in their position.
In fact as problems happen (hardware failure leading to cluster storage rebalancing for example) you should probably get a system able to handle 10x the amount of writes you expect it to handle and then monitor the SSD SMART attributes to be alerted long before they die and replace them before problems happen. You definitely want a controller allowing access to this information. If you can't you will have to monitor the writes and guess this value which is risky as write amplification inside SSDs is not easy to guess...

Lionel

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux