Re: Fast Ceph a Cluster with PB storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Mon, 08 Aug 2016 17:39:07 +0300 Александр Пивушков wrote:

> 
> Hello dear community!
> I'm new to the Ceph and not long ago took up the theme of building clusters.
> Therefore it is very important to your opinion.
> It is necessary to create a cluster from 1.2 PB storage and very rapid access to data. Earlier disks of "Intel® SSD DC P3608 Series 1.6TB NVMe PCIe 3.0 x4 Solid State Drive" were used, their speed of all satisfies, but with increase of volume of storage, the price of such cluster very strongly grows and therefore there was an idea to use Ceph.

You may want to tell us more about your environment, use case and in
particular what your clients are.
Large amounts of data usually means graphical or scientific data,
extremely high speed (IOPS) requirements usually mean database
like applications, which one is it, or is it a mix?

For example, how were the above NVMes deployed and how did they serve data
to the clients?
The fiber channel bit in your HW list below makes me think you're using
VMware, FC and/or iSCSI right now.

> There are following requirements:
> - The amount of data 160 GB should be read and written at speeds of SSD P3608
Again, how are they serving data now?
The speeds (latency!) a local NVMe can reach is of course impossible with
a network attached SDS like Ceph.
160GB is tiny, are you sure about this number?

> - There must be created a high-speed storage of the SSD drives 36 TB volume with read / write speed tends to SSD P3608
How is that different to the point above?

> - Must be created store 1.2 PB with the access speed than the bigger, the better ...
Ceph scales well.
> - Must have triple redundancy
Also not an issue, depending on how you define this.

> I do not really understand yet, so to create a configuration with SSD P3608 Disk. Of course, the configuration needs to be changed, it is very expensive.

There are HW guides and plenty of discussion about how to design largish
clusters, find and read them.
Like the ML threads:
"800TB - Ceph Physical Architecture Proposal"
"dense storage nodes"

Also read up on Ceph cache-tiering. 

> InfiniBand will be used, and 40 GB Ethernet.
> We will also use virtualization to high-performance hardware to optimize the number of physical servers.

What VM stack/environment?
If it is VMware, Ceph is a bad fit as the most stable way to export Ceph
storage to this platform is NFS, which is also the least performant
(AFAIK).

> I'm not tied to a specific server models and manufacturers. I create only the cluster scheme which should be criticized :) 
> 
> 1. OSD - 13 pieces.
>      a. 1.4 TB SSD-drive analogue Intel® SSD DC P3608 Series - 2 pieces

For starters, that's not how you'd likely deploy high speed storage, see
CPU below. 
Also this gives you 36TB un-replicated capacity, so you'll need 2-3 times
the amount to be safe. 

>      b. Fiber Channel 16 Gbit / c - 2 port.
What for?
If you need a FC GW (bad idea), 2 (dedicated if possible) machines will do.

And where/what is your actual network HW?

>      c. An array (not RAID) to 284 TB of SATA-based drives (36 drives for 8TB);

Ceph works better with not overly large storage nodes and OSDs. 
I know you're trying to minimize rack space and cost, but something with
less OSDs per node and 4TB per OSD is going to be easier to get right.

>      d. 360 GB SSD- analogue Intel SSD DC S3500 1 piece
What is that for?
Ceph only performs decently (with the current filestore) when using SSDs
as journals for the HDD based OSDs, a singe SSD won't cut that and a 3500
has likely insufficient endurance anyway.

For 36 OSDs you're looking at 7 400GB DC S3710s or 3 400GB DC P3700s...

>      e. SATA drive 40 GB for installation of the operating system (or booting from the network, which is preferable)
>      f. RAM 288 GB
Generous, but will help reads.

>      g. 2 x CPU - 9 core 2 Ghz. - E-5-2630v4
Firstly that's a 10 core, 2.2GHz CPU.  
Secondly, most likely underpowered if serving both NVMes and 36 HDD OSDs.
A 400GB DC S3610 (so slower SATA, not NVMe) will eat about 3 2.2GHz cores
when doing small write IOPS.

There are several saner approaches I can think of, but these depend on the
answers to the questions above.


> 2. MON - 3 pieces. All virtual server:
Virtual server can work, I prefer real (even if shared) HW.
3 is the absolute minimum, 5 would be a good match.

>      a. 1 Gbps Ethernet / c - 1 port.
While the MONs don't have much data traffic, the lower latency of a faster
network would be helpful.

If you actually need MDS, make those (real) servers also MONs and put
the rest on OSD nodes or VMs.

>      b. SATA drive 40 GB for installation of the operating system (or booting from the network, which is preferable)
>      c. SATA drive 40 GB
MONs like fast storage for their leveldb. 

>      d. 6GB RAM
A bit low, but most likely enough.

>      e. 1 x CPU - 2 cores at 1.9 Ghz
Enough for most scenarios, faster cores would be better.


> 3. MDS - 2 pcs. All virtual server:
Do you actually know what MDS do?
And where in your use case is CephFS needed or required?

>      a. 1 Gbps Ethernet / c - 1 port.
>      b. SATA drive 40 GB for installation of the operating system (or booting from the network, which is preferable)
>      c. SATA drive 40 GB
>      d. 6GB RAM
>      e. 1 x CPU - min. 2 cores at 1.9 Ghz
Definitely not, you want physical nodes with the same level of networking
as the OSDs and your main clients. 
You will also want faster and more cores and way more memory (at least
64GB), how much depends on your CephFS size (number of files).

> I assume to use for an acceleration SSD for a cache and a log of OSD.
MDS don't hold any local data (caches), a logging SSD is fine. 


Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux