Re: BLUEFS_SPILLOVER

"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> · Thu, 16 Sep 2021 11:08:07 +0000

It's interesting article, so it is fixed size, maybe the article is older. 
I have another cluster where just 2 nvme drive uses 1.92TB nvme and there also have spillover, but this is the value there:

osd.0 spilled over 43 GiB metadata from 'db' device (602 GiB used of 894 GiB) to slow device

Seems like ceph handles it somehow dynamically because here double the size :/ 

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx
---------------------------------------------------

-----Original Message-----
From: Janne Johansson <icepic.dz@xxxxxxxxx> 
Sent: Thursday, September 16, 2021 1:03 PM
To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxx>
Subject: Re:  BLUEFS_SPILLOVER

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

Den tors 16 sep. 2021 kl 06:28 skrev Szabo, Istvan (Agoda)
<Istvan.Szabo@xxxxxxxxx>:
>
> Hi,
>
> Something weird happening, I have on 1 nvme drive and 3x SSD's are using for wal and db.
> The LVM is 596GB but in the health detail is says x GiB spilled over 
> to slow device, however just 317 GB use only :/
>
> [WRN] BLUEFS_SPILLOVER: 3 OSD(s) experiencing BlueFS spillover
>      osd.10 spilled over 41 GiB metadata from 'db' device (317 GiB used of 596 GiB) to slow device
>      osd.14 spilled over 5.8 GiB metadata from 'db' device (317 GiB used of 596 GiB) to slow device
>      osd.27 spilled over 94 GiB metadata from 'db' device (313 GiB 
> used of 596 GiB) to slow device
>
> I don't understand. Is this warning means it's just spilled over in the past but got back to normal? Also why it doesn't use all the lvm?
>

Lowlevel info at:
https://github.com/facebook/rocksdb/wiki/Leveled-Compaction

The defaults of RocksDB uses 2.7G, 27G, 270G per "level" of DB, and spills over to the data device if the DB is too large for the sum of all previous levels, so if your DB is 317G, then it will not create a level on the DB device unless there was ~3000G free to hold it, and hence spills over to the data device.

I think recent releases have an option to make DBs of sizes that do not match the 3,30,300 sizes, but upto that it was recommended to set DB to 3,30 or 300 depending on capacity. (and some wanted twice this amount to allow for compaction, so 6,60,600)

--
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx