Re: Building a Pb EC cluster for a cheaper cold storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



11.11.2015 06:14, Christian Balzer пишет:
> 
> Hello,
> 
> On Tue, 10 Nov 2015 13:29:31 +0300 Mike Almateia wrote:
> 
>> Hello.
>>
>> For our CCTV storing streams project we decided to use Ceph cluster with 
>> EC pool.
>> Input requirements is not scary: max. 15Gbit/s input traffic from CCTV, 
>> 30 day storing,
>> 99% write operations, a cluster must has grow up with out downtime.
>>
> I have a production cluster that is also nearly write only.
> 
> I'd say that 1.5GB/s is a pretty significant amount of traffic, but not
> scary in and by itself. 
> The question is how many streams are we talking about, how are you writing
> that data (to CephFS, RBD volumes)?

Special software for CCTV, on max. 70 Windows KVM VM, will be storing
traffic on local drive. No CephFS, only RBD.

Traffic will balance between 70 VMs.

I look here max. 70 streams to the cluster and each steam looks like
around 200Mbit/s.

> 
> All of this will decide how IOPS intense (as opposed to throughput)
> storing your streams will be.
> 

99% write, only big blocks, sequential write.
I think around 2000 IOPS with 1 Mb blocks and QD=32.

>> By now our vision of architecture it like:
>> * 6 JBOD with 90 HDD 8Tb capacity each (540 HDD total)
>> * 6 Ceph servers connected to it own JBOD (we will have 6 pairs: 1 
>> Server + 1 JBOD).
>>
> As you guessed yourself and as Paul suspects as well, I think the amount
> of OSDs per node is too dense, more of a CPU than RAM problem, plus the
> other tuning it will require.

Yes, CPU need for Cache Tier and EC, but if we have around 250 Mb/s per
server, may be it will be enough?

> 
> Also the cache tier HDDs (unless they're SSDs) are likely going to be
> another bottleneck.
> 

There no need fast Cache Tier, if we will use Read Only Cache policy.
A SSD drive eating more CPU, also.

> Consider this alternative:
> 
> * Same JBOD chassis
> * Quite different Ceph nodes:
> - 1 or 2 RAID controllers with the most cache you can get (I like Areca's
>   with 4GB, YMMV). That cache (and the journal SSDs suggested below)
>   should take care of things if your 15GBit/s is sufficiently fragmented
>   to cause large amounts of IOPS.
> - 8x 11 disk RAID6, depending on how many controllers you have 1 or 2
>   global hotspares. 
> - 256GB RAM or more, tuned to .
> - If you can afford it, use FAST SSDs (or NVMe) as journals. You want to
>   be able to saturate your network, so around 2GB/s. 
>   Four Intel DC S3700 400GB will get you close to that.
> - Since you now only have 8 OSDs per node, your CPU requirements are more
>   to the tune of 12 (fast, 2.5GHz++) cores.
> 
> With "failproof" OSDs, you can choose 2x (not the default 3x) replication.
> 
> Another bonus is that you'll likely never have a failed OSD and the
> resulting traffic storm.

It's interesting, but more more expensive and we lose capacity too much:
Replicas instead EC + RAID6 (N-2) + at least 2 HostSpare per JBOD.

It is beyond a limit of our budget.

> 
> The trick to keep things happy here are to have enough RAM for all hot
> objects that need to be read, especially inodes and other FS metadata.
> 
> Of course if you can afford it (price/space), having less dense nodes will
> significantly reduce the impact of a node failure.
> 
>> Ceph servers hardware details:
>> * 2 x E5-2690v3 : 24 core (w/o HT), 2.6 Ghz each
>> * 256 Gb RAM DDR4
>> * 4 x 10Gbit/s NIC port (2 for Client network and 2 for Cluster Network)
>> * servers also have 4 (8) x 2.5" HDD SATA on board for Cache Tiering 
>> Feature (because ceph clients can't directly talk with EC pool)
>> * Two HBA SAS controllers work with multipathing feature, for HA
>> scenario.
> A bit of overkill, given how your failure domain will still be at least
> per storage node, worse depending on network/switch topology.
> 
> Regards,
> 
> Christian
> 


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux