Re: NVMe disk - size

Paul Emmerich <paul.emmerich@xxxxxxxx> · Fri, 15 Nov 2019 15:26:30 +0100

On Fri, Nov 15, 2019 at 3:16 PM Kristof Coucke <kristof.coucke@xxxxxxxxx> wrote:
> We’ve configured a Ceph cluster with 10 nodes, each having 13 large disks (14TB) and 2 NVMe disks (1,6TB).
> The recommendations I’ve read in the online documentation, state that the db block device should be around 4%~5% of the slow device. So, the block.db should be somewhere between 600GB and 700GB as a best practice.

That recommendation is unfortunately not based on any facts :(
How much you really need depends on your actual usage.

> However… I was thinking to only reserve 200GB per OSD as fast device… Which is 1/3 of the recommendation…

For various weird internal reason it'll only use ~30 GB in the steady
state during operation before spilling over at the moment, 300 GB
would be the next magical number
(search mailing list for details)

> Is it recommended to still use it as a block.db

yes

> or is it recommended to only use it as a WAL device?

no, there is no advantage to that if it's that large

> Should I just split the NVMe in three and only configure 3 OSDs to use the system? (This would mean that the performace shall be degraded to the speed of the slowest device…)

no

> We’ll only use the system thru the RGW (No CephFS, nor block device), and we’ll store “a lot” of small files on it… (Millions of files a day)

the current setup gives you around ~1.3 TB of usable metadata space
which may or may not be enough, really depends on how much "a lot" is
and how small "small" is.

It might be better to use the NVMe disks as dedicated OSDs and map all
metadata pools onto them directly, that allows you to fully utilize
the space for RGW metadata (but not Ceph metadata in the data pools)
without running into weird db size restrictions.
There are advantages and disadvantages to both approaches

Paul

>
>
>
> The reason I’m asking it, is that I’ve been able to break the test system (long story), causing OSDs to fail as they ran out of space… Expanding the disks (the block DB device as well as the main block device) failed with the ceph-bluestore-tool…
>
>
>
> Thanks for your answer!
>
>
>
> Kristof
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com