Re: Low level bluestore usage

"Alexander E. Patrakov" <patrakov@xxxxxxxxx> · Wed, 23 Sep 2020 06:09:36 +0500

On Wed, Sep 23, 2020 at 3:03 AM Ivan Kurnosov <zerkms@xxxxxxxxxx> wrote:
>
> Hi,
>
> this morning I woke up to a degraded test ceph cluster (managed by rook,
> but it does not really change anything for the question I'm about to ask).
>
> After checking logs I have found that bluestore on one of the OSDs run out
> of space.

I think this is a consequence, and the real error is something else
that happened before.

The problem is that, if the cluster is unhealthy, the MON storage
accumulates a lot of osdmaps and pgmaps, and is not cleaned up
automatically, because the MONs think that these old versions might be
needed. And OSDs also get a copy of these osdmaps and pgmaps, if I
understand correctly, that's why small OSDs get quickly filled up if
the cluster stays unhealthy for a few hours.

> So, my question would be: how could I have prevented that? From monitoring
> I have (prometheus) - OSDs are healthy, have plenty of space, yet they are
> not.
>
> What command (and prometheus metric) would help me understand the actual
> real bluestore use? Or am I missing something?

You can fix monitoring by setting the "mon data size warn" to
something like 1 GB or even less.

> Oh, and I "fixed" the cluster by expanding the broken osd.0 with a larger
> 15GB volume. And 2 other OSDs still run on 10GB volumes.

Sometimes this doesn't help. For data recovery purposes, the most
helpful step if you get the "bluefs enospc" error is to add a separate
db device, like this:

systemctl disable --now ceph-osd@${OSDID}
truncate -s 32G /junk/osd.${OSDID}-recover/block.db
sgdisk -n 0:0:0 /junk/osd.${OSDID}-recover/block.db
ceph-bluestore-tool \
    bluefs-bdev-new-db --path /var/lib/ceph/osd/ceph-${OSDID} \
    --dev-target /junk/osd.${OSDID}-recover/block.db \
    --bluestore-block-db-size=31G --bluefs-log-compact-min-size=31G

Of course you can use a real block device instead of just a file.

After that, export all PGs using ceph-objecttstore-tool and re-import
into a fresh OSD, then destroy or purge the full one.

Here is why the options:

--bluestore-block-db-size=31G: ceph-bluestore-tool refuses to do
anything if this option is not set to any value
--bluefs-log-compact-min-size=31G: make absolutely sure that log
compaction doesn't happen, because it would hit "bluefs enospc" again.

-- 
Alexander E. Patrakov
CV: http://pc.cd/PLz7
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx