Re: Nautilus: BlueFS spillover

Igor Fedotov <ifedotov@xxxxxxx> · Fri, 27 Sep 2019 14:31:17 +0300

Hi Eugen,

generally expanding existing DB devices isn't enough to immediately 
eliminate the spillover alert. As spilled over data is already there and 
doesn't immediately move by such an expansion. Theoretically allert will 
eventually  disappear after RocksDB completely rewrites all the data at 
slow device.

Never experimented how this happens in the reality though...

So you should either wait for a while for things to stabilize. May be 
monitoring spilled over volumes from the alerts which presumably should 
decrease (or at least doesn't grow). Please note this will most probably 
happen under some load only.

Or migrate bluefs data from main device with ceph-bluestore-tool. Which 
I'd recommend in case of emergency only.

Or try to compact DB with ceph-kvstore-tool. Which is unlikely to help.

Thanks,

Igor

On 9/27/2019 11:54 AM, Eugen Block wrote:

Update: I expanded all rocksDB devices, but the warnings still appear:

BLUEFS_SPILLOVER BlueFS spillover detected on 10 OSD(s)
     osd.0 spilled over 2.5 GiB metadata from 'db' device (2.4 GiB 
used of 30 GiB) to slow device
     osd.19 spilled over 66 MiB metadata from 'db' device (818 MiB 
used of 15 GiB) to slow device
     osd.25 spilled over 2.2 GiB metadata from 'db' device (2.6 GiB 
used of 30 GiB) to slow device
     osd.26 spilled over 1.6 GiB metadata from 'db' device (1.9 GiB 
used of 30 GiB) to slow device
     osd.27 spilled over 2.6 GiB metadata from 'db' device (2.5 GiB 
used of 30 GiB) to slow device
     osd.28 spilled over 2.4 GiB metadata from 'db' device (1.3 GiB 
used of 30 GiB) to slow device
     osd.29 spilled over 2.9 GiB metadata from 'db' device (1.7 GiB 
used of 30 GiB) to slow device
     osd.31 spilled over 2.2 GiB metadata from 'db' device (2.7 GiB 
used of 30 GiB) to slow device
     osd.32 spilled over 2.4 GiB metadata from 'db' device (1.7 GiB 
used of 30 GiB) to slow device
     osd.33 spilled over 2.2 GiB metadata from 'db' device (2.0 GiB 
used of 30 GiB) to slow device

OSD.19 can be ignored as it's currently not in use, but the other 
devices have been expanded from 20 to 30 GB (following the 
explanations about the compaction levels).
According to the OSD logs these are the sizes we're dealing with:

Level  Size
L0     31.84
L1     183.86
L2     923.67
L3     3.62
Sum    4.74
Int    0.00

Is there any sign that these OSDs would require even larger bdev 
devices (300GB)? Which would not be possible with the currently used 
SSDs, unfortunately.

Is there anything else I can do without recreating the OSDs?

Thanks,
Eugen

Zitat von Konstantin Shalygin <k0ste@xxxxxxxx>:

On 9/26/19 9:45 PM, Eugen Block wrote:
I'm following the discussion for a tracker issue [1] about spillover 
warnings that affects our upgraded Nautilus cluster.
Just to clarify, would a resize of the rocksDB volume (and expanding 
with 'ceph-bluestore-tool bluefs-bdev-expand...') resolve that or do 
we have to recreate every OSD?

Yes, this works since Luminous 12.2.11

k

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx