Re: Impact of DB+WAL undersizing in Pacific and later

Mark Nelson <mnelson@xxxxxxxxxx> · Sun, 13 Nov 2022 12:26:22 -0600

Hi Gregor,

DB space usage will be mostly governed by the number of onodes and 
blobs/extents/etc (potentially caused by fragmentation).  If you are 
primarily using RBD and/or large files in CephFS and you aren't doing a 
ton of small overwrites, your DB usage could remain below 1%.  It's 
possible though that if you have tons of small files and/or small 
overwrites it might be tougher, especially as the cluster gets closer to 
being full.

Mark

On 11/13/22 8:54 AM, Gregor Radtke wrote:
Hi folks,

i am in the unfortunate situation that i do not have enough or big
enough NVMe devices available to acommodate DB+WAL for my spinning
rust-based OSDs. In a few words, i have three hosts with 20 5.5T HDDs
each and a 1.6TB PCIe NVMe.

When sizing this and putting together the BoM, i had the old-style
level mechanism in mind which meant L0+L1 w/ default settings results
in ~ 30G per OSD for DB (and WAL as far as i understand, we do not
plan to have a separate WAL device). Thus, 32G per DB+WAL would result
in 32*20 = 640G of capacity needed on the NVMe, should be fine.

Our workload will almost exclusively be RBD, with one CephFS for ISO
images. No RGW AFAIK. Now, i want to understand if we can still go
with ~ 1% of data device capacity, or if we need to provision more. If
we cannot provision more, what would be the impact in that case? Is
RBD fine and CephFS would suffer a lot?

5.5T * 1% would result in ~ 60G * 20 (or later 24) = 1200G (1440G)
usage on the NVMe for DB+WAL

For all my calculations, i am putting my faith in the official Ceph
documentation:

"The general recommendation is to have block.db size in between 1% to
4% of block size. For RGW workloads, it is recommended that the
block.db size isn’t smaller than 4% of block, because RGW heavily uses
it to store metadata (omap keys). For example, if the block size is
1TB, then block.db shouldn’t be less than 40GB. For RBD workloads, 1%
to 2% of block size is usually enough."

However, Proxmox apparently recommends 10% of the data device size to
provision for DB+WAL. Obviously, 4% and even more 10% would quadruple
the cost. We don’t want to achieve stellar performance, just
“good-enough” for ~ 200 VMs that currently reside on redundant
Datacore FC storage backed by FC RAID (N*16G, 24x 7.2k). I expect
commit latency for Ceph with QD=1 to be ~ 3x compared with the old
solution, but all other parameters would probably be a lot better.

Cheers,
Gregor
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx