Re: Dear Abby: Why Is Architecting CEPH So Hard?

Martin Verges <martin.verges@xxxxxxxx> · Thu, 23 Apr 2020 07:38:12 +0200

>From all our calculations of clusters, going with smaller systems reduced
the TCO because of much cheaper hardware.
Having 100 Ceph nodes is not an issue, therefore you can scale small and
large clusters with the exact same hardware.

But please, prove me wrong. I would love to see a way to reduce the TCO
even more and if you have a way, I would love to hear about it.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.verges@xxxxxxxx
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx

Am Do., 23. Apr. 2020 um 05:18 Uhr schrieb lin.yunfan <lin.yunfan@xxxxxxxxx
>:

> I have seen a lot of people saying not to go with big nodes.
> What is the exact reason for that?
> I can understand that if the cluster is not big enough then the total
> nodes count could be too small to withstand a node failure, but if the
> cluster is big enough wouldn't the big node be more cost effective?
>
>
> lin.yunfan
> lin.yunfan@xxxxxxxxx
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=2&name=lin.yunfan&uid=lin.yunfan%40gmail.com&iconUrl=http%3A%2F%2Fmail-online.nosdn.127.net%2Fsm91cce6d5df0eecf9304e3975eeeae111.jpg&items=%5B%22%22%2C%22lin.yunfan%40gmail.com%22%2C%22%22%2C%22%22%2C%22%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
> On 4/23/2020 06:33，Brian Topping<brian.topping@xxxxxxxxx>
> <brian.topping@xxxxxxxxx> wrote：
>
> Great set of suggestions, thanks! One to consider:
>
> On Apr 22, 2020, at 4:14 PM, Jack <ceph@xxxxxxxxxxxxxx> wrote:
>
> I use 32GB flash-based satadom devices for root device
> They are basically SSD, and do not take front slots
> As they are never burning up, we never replace them
> Ergo, the need to "open" the server is not an issue
>
>
>
> This is probably the wrong forum to understand how you are not burning
> them out. Any kind of logs or monitor databases on a small SATADOM will
> cook them quick, especially an MLC. There is no extra space for wear
> leveling and the like. I tried making it work with fancy systemd logging to
> memory and having those logs pulled by a log scraper storing to the actual
> data drives, but there was no place for the monitor DB. No monitor DB means
> Ceph doesn’t load, and if a monitor DB gets corrupted, it’s perilous for
> the cluster and instant death if the monitors aren’t replicated.
>
> My node chassis have two motherboards and each is hard limited to four
> SSDs. On each node, `/boot` is mirrored (RAID1) on partition 1, `/` is
> stripe/mirrored (RAID10) on p2, then used whatever was left for ceph data
> on partition 3 of each disk. This way any disk could fail and I could still
> boot. Merging the volumes (ie no SATADOM), wear leveling was statistically
> more effective. And I don’t have to get into crazy system configurations
> that nobody would want to maintain or document.
>
> $0.02…
>
> Brian
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx