Re: Usable space vs. Overhead

Janne Johansson <icepic.dz@xxxxxxxxx> · Wed, 29 Jul 2020 17:03:52 +0200

Den ons 29 juli 2020 kl 16:34 skrev David Orman <ormandj@xxxxxxxxxxxx>:

> Thank you, everyone, for the help. I absolutely was mixing up the two,
> which is why I was asking for guidance. The example made it clear. The
> question I was trying to answer was: what would the capacity of the cluster
> be, for actual data, based on the raw disk space + server/drive count +
> erasure coding profile. It sounds like the 'usable' calculation (66% in
> this case) is the accurate number, assuming I were to fill the cluster to
> 100%, which I realize is not ideal with Ceph.
>

It is bad on almost all kinds of storage systems to fill it up like that,
any storage that has any concept of data that can move (so excluding tapes
or CDroms more or less) will want to have some extra space, and ceph will
start to warn/act/refuse when you pass 85,90,95% filled, so aim for
something where you will be starting to buy more nodes/disks when your
first OSD is over 70 or so, otherwise you will be doing a lot of manual
work like rebalancing and reweighing in order to not go above 85 until your
new drives can be added to the system.

If you have few nodes, one host outage will represent a large part of the
available storage, so one can make all kinds of calculations on overhead
and things like "with EC4+2 I can lose two drives and still recover", but
if you only have 6 hosts and one goes dead (for any reason), your total has
fallen with 16.7% so if you were at some 70% full with 6 hosts, you are
going to be all but totally filled up with only 5 which will cause issues
(like OSDs refusing IO to not move to 100% full), even if you only lost one
of each EC4+2-group from that host.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx