I've been meaning to write an email with the experience we had at the company I work. For the lack of a more complete one I'll just tell some of the findings. Please note these are my experiences, and are correct for my environment. The clients are running on openstack, and all servers are trusty. Tests were made with Hammer (0.94.2).
TLDR: if performance is your objective buy 1S boxes with high frequency, good journal SSDs, and not many SSDs. Also change the cpu to performance mode, instead the default ondemand. And don't forget 10Gig is a must. Replicated pools are also a must for performance.
Stuff we still have to do revolves around jemalloc vs tcmalloc - trusty has the bug on the thread cache bytes variable. Also we still have to test various tunable options, like threads, caches, etc...
Hope this helps.
On Sat, Aug 22, 2015 at 4:45 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
Another thing that is probably worth considering is the practical side as
well. A lot of the Xeon E5 boards tend to have more SAS/SATA ports and
onboard 10GB, this can make quite a difference to the overall cost of the
solution if you need to buy extra PCI-E cards.
Unless I've missed one, I've not spotted a Xeon-D board with a large amount
of onboard sata/sas ports. Please let me know if such a system exists as I
would be very interested.
We settled on the Hadoop version of the Supermicro Fat Twin. 12 x 3.5" disks
+ 2x 2.5 SSD's per U, onboard 10GB-T and the fact they share chassis and
PSU's keeps the price down. For bulk storage one of these with a single 8
core low clocked E5 Xeon is ideal in my mind. I did a spreadsheet working
out U space, power and cost per GB for several different types of server,
this solution came out ahead in nearly every category.
If there is a requirement for a high perf SSD tier I would probably look at
dedicated SSD nodes as I doubt you could cram enough CPU power into a single
server to drive 12xSSD's.
You mentioned low latency was a key requirement, is this always going to be
at low queue depths? If you just need very low latency but won't actually be
driving the SSD's very hard you will probably find a very highly clocked E3
is the best bet with 2-4 SSD's per node. However if you drive the SSD's
hard, a single one can easily max out several cores.
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Mark Nelson
> Sent: 22 August 2015 00:00
> To: ceph-users@xxxxxxxxxxxxxx
> Subject: Re: OSD GHz vs. Cores Question
>
> FWIW, we recently were looking at a couple of different options for the
> machines in our test lab that run the nightly QA suite jobs via
teuthology.
>
> From a cost/benefit perspective, I think it really comes down to
something
> like a XEON E3-12XXv3 or the new XEON D-1540, each of which have
> advantages/disadvantages.
>
> We were very tempted by the Xeon D but it was still just a little too new
for
> us so we ended up going with servers using more standard E3 processors.
> The Xeon D setup was slightly cheaper, offers more theoretical
performance,
> and is way lower power, but at a much slower per-core clock speed. It's
likely
> that for our functional tests that clock speed may be more important than
> the cores (but on these machines we'll only have 4 OSDs per server).
>
> Anyway, I suspect that either setup will probably work fairly well for
> spinners. SSDs get trickier.
>
> Mark
>
> On 08/21/2015 05:46 PM, Robert LeBlanc wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA256
> >
> > We are looking to purchase our next round of Ceph hardware and based
> > off the work by Nick Fisk [1] our previous thought of cores over clock
> > is being revisited.
> >
> > I have two camps of thoughts and would like to get some feedback, even
> > if it is only theoretical. We currently have 12 disks per node (2
> > SSD/10 4TB spindle), but we may adjust that to 4/8. SSD would be used
> > for journals and cache tier (when [2] and fstrim are resolved). We
> > also want to stay with a single processor for cost, power and NUMA
> > considerations.
> >
> > 1. For 12 disks with three threads each (2 client and 1 background),
> > lots of slower cores would allow I/O (ceph code) to be scheduled as
> > soon as a core is available.
> >
> > 2. Faster cores would get through the Ceph code faster but there would
> > be less cores and so some I/O may have to wait to be scheduled.
> >
> > I'm leaning towards #2 for these reasons, please expose anything I may
> > be missing:
> > * The latency will only really be improved in the SSD I/O with faster
> > clock speed, all writes and any reads from the cache tier. So 8 fast
> > cores might be sufficient, reading from spindle and flushing the
> > journal will have a "substantial" amount of sleep to allow other Ceph
> > I/O to be hyperthreaded.
> > * Even though SSDs are much faster than spindles they are still orders
> > of magnitude slower than the processor, so it is still possible to get
> > more lines of code executed between SSD I/O with a faster processor
> > even with less cores.
> > * As the Ceph code is improved through optimization and less code has
> > to be executed for each I/O, faster clock speeds will only provide
> > even more benefit (lower latency, less waiting for cores) as the delay
> > shifts more from CPU to disk.
> >
> > Since our workload is typically small I/O 12K-18K, latency means a lot
> > to our performance.
> >
> > Our current processors are Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz
> >
> > [1] http://www.spinics.net/lists/ceph-users/msg19305.html
> > [2] http://article.gmane.org/gmane.comp.file-systems.ceph.user/22713
> >
> > Thanks,
> > - ----------------
> > Robert LeBlanc
> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
> > -----BEGIN PGP SIGNATURE-----
> > Version: Mailvelope v1.0.0
> > Comment: https://www.mailvelope.com
> >
> >
> wsFcBAEBCAAQBQJV16pfCRDmVDuy+mK58QAA9cgP/RwsZESriIMWZHeC0P
> mS
> > CH8iEFCXCRCzvW+lYMwB9FOvPmBLlhayp39Z93Djv3sef02t3Z9NFPq7fUmb
> >
> ZwZ9SnH9oVmRElbQyNtt8MfJ2cqXRU6JtYsTHnZ5G0+sFvv+BY+mYD89nULw
> > xwbsosUCBA9Rp8geq++XLSbuEBt8AfreYaSBzY1kg51Ovtmb97R0hB7bQBWP
> >
> oUgi/ET24w4sUqLSo4WBNBZ0WeWsRA4w5PEzHk28ynBY0B/GAtiGadtZWOF
> X
> >
> 6bNz3KjMbLEWU9UF+7WyL+ppru6RIUZeayFp3tdIzqQdMbeBDPO54miOezw
> v
> > 9iFNuzxj2P6jqlp18W2SZYN2JF5qCgrG5mXlU2bOM9k4IlQAqG2V3iD/rSF8
> > LmL/FSzU6C4k8PffaNis/grZAtjN4tCLRAoWUmsXSRW1NpSNm13l6wJfg5xq
> >
> XGLQ4CfGMV/o3a1Oz1M7jfMLWb0b6TeYlqC8eeHUp9ipa8IaVKsGNDJYQOn
> M
> >
> LvyRuyB7yIM6dEXmJjE5ZQPwbh0se3+hUhNolQ949aKrY2u8Q2kHhKqOyzuw
> >
> EAAyHkeqBtAZFW+DActHYVCi9lJO8shmeWuVKxAuzKYJGYzD8yVIS+AVqZ2k
> > OH2/NNAXzBKefsL1gd8DT4QuYqDoEN2arO+PN0vZeEruQ4vg6qZvabqeB/4o
> > kUd4
> > =F5Sx
> > -----END PGP SIGNATURE-----
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com