Re: Cost- and Powerefficient OSD-Nodes

Dominik Hannen <hannen@xxxxxxxxx> · Wed, 29 Apr 2015 13:31:52 +0200 (CEST)

----- Ursprüngliche Mail -----
> Von: "Nick Fisk" <nick@xxxxxxxxxx>
> An: "Dominik Hannen" <hannen@xxxxxxxxx>
> CC: ceph-users@xxxxxxxxxxxxxx
> Gesendet: Mittwoch, 29. April 2015 11:32:18
> Betreff: RE: Cost- and Powerefficient OSD-Nodes

>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Dominik Hannen
>> Sent: 29 April 2015 00:30
>> To: Nick Fisk
>> Cc: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  Cost- and Powerefficient OSD-Nodes
>> 
>> > It's all about the total latency per operation. Most IO sizes over
>> > 10GB don't make much difference to the Round Trip Time. But
>> > comparatively even 128KB IO's over 1GB take quite a while. For example
>> > ping a host with a payload of 64k over 1GB and 10GB networks and look
>> > at the difference in times. Now double this for Ceph (Client->Prim
>> > OSD->Sec OSD)
>> >
>> > When you are using SSD journals you normally end up with write latency
>> > of 3-4ms over 10GB, 1GB networking will probably increase this by
>> > another 2-4ms. IOPs=1000/latency
>> >
>> > I guess it all really depends on how important performance is
>> 
>> I recon we are talking about single-threaded IOPs? It looks like 10ms
> latency
>> is in the worst-case region.. 100 IOPs will do fine.
>> 
>> At least in my understanding heavily multi-threaded load should be able to
>> get higher IOPs regardless of latency?
> 
> Yes as the queue depth increases so will total IOPs, but I found it quite
> hard to get above 40-50MB/s unless doing large block sizes
> 
>> 
>> Some presentation material suggested that the adverse effects of higher
>> latency, due to 1Gbit, begin above IO sizes of 2k, maybe there is room to
>> tune IOPs hungry applications/vms accordingly.
>> 
>> > Just had a look and the Seagate Surveillance disks spin at 7200RPM
>> > (missed that you put that there), whereas the WD ones that I am
>> > familiar with spin at 5400rpm, so not as bad as I thought.
>> >
>> > So probably ok to use, but I don't see many people using them for
>> > Ceph/ generic NAS so can't be sure there's no hidden gotchas.
>> 
>> I am not sure how trustworthy newegg-reviews are, but somehow I get
>> some doubts about them now.
>> I guess it does not matter that much, at least if not more than a disk a
> month
>> is failing? The 3-year warranty gives some hope..
>> 
>> Are there some cost-efficient HDDs that someone can suggest? (Most likely
>> 3TB drives, that seems to be the sweet-spot at the moment.)
> 
> I'm using WD Red Pro (non pro's are slower), reasonable cost and perform
> pretty much the same as the enterprise line drives

I guess I will be going with those or WD Se then, whichever is cheaper.
The specs are afaict identical.

>> > Sorry nothing in detail, I did actually build a ceph cluster on the
>> > same 8 core CPU as you have listed. I didn't have any performance
>> > problems but I do remember with SSD journals when doing high queue
>> > depth writes I could get the CPU quite high. It's like what I said
>> > before about the 1vs10Gb networking, how important is performance, If
>> > using this CPU gives you an extra 1ms of latency per OSD, is that
>> acceptable?
>> >
>> > Agree 12cores (guessing 2.5Ghz each) will be an overkill for just 12
>> > OSDs. I have a very similar spec and see exactly the same as you, but
>> > will change the nodes to 1CPU each when I expand and use the spare
>> > CPU's for the new nodes.
>> >
>> > I'm using this:-
>> >
>> > http://www.supermicro.nl/products/system/4U/F617/SYS-F617H6-
>> FTPTL_.cfm
>> >
>> > Mainly because of rack density, which I know doesn't apply to you. But
>> > the fact they share PSU's/Rails/Chassis helps reduce power a bit and
>> > drives down cost
>> >
>> > I can get 14 disks in each and they have 10GB on board. The SAS
>> > controller is flashable to JBOD mode.
>> >
>> > Maybe one of the other Twin solutions might be suitable?
>> 
>> I did consider that exact model (It was mentioned on the list some time
> ago) I
>> could get about the same effective storage-capacity with it, but 10G-
>> Networking is just too expensive on the Switch-side.
>> 
>> Also those nodes and 10G-Switches consume a lot more power.
>> 
>> By my estimates and numbers I found, the Avoton-Nodes should run at
>> about 55W each. The Switches (EX3300) according to tech-specs would need
>> 76W at max each.
>> 
> 
> Have you worked out how many watts per disk that is though?
> 
> 55W/3Disks = 18.3W per disk
> 
> My Chassis at the moment
> 170w/12disks = 13.3W per disk

I will be running 4 disks/osds per node, ~ 13.75W per disk. I hope the real
consumption will be around 50W.
(I want to put the SSD inside the node. SSD failure is equivalent to
complete node failure anyway.)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com