Re: Fwd: BlueFS spillover yet again

Vladimir Prokofev <v@xxxxxxxxxxx> · Wed, 5 Feb 2020 19:26:39 +0300



Thank you for the insight.
> If you're using the default options for rocksdb, then the size of L3 will
be 25GB
Where this number comes from? Any documentation I can read?
I want to have a better understanding on how DB size is calculated.

ср, 5 февр. 2020 г. в 18:53, Moreno, Orlando <orlando.moreno@xxxxxxxxx>:

> Hi Vladimir,
>
> If you're using the default options for rocksdb, then the size of L3 will
> be 25GB. Since your block-db is only 20GB and L3 can only be filled if the
> entire level's size is available, bluefs will begin spillover. Like Igor
> said, having 30GB+ is recommended if you want to host up to 3 levels of
> rocksdb in the SSD.
>
> Thanks,
> Orlando
>
> -----Original Message-----
> From: Igor Fedotov <ifedotov@xxxxxxx>
> Sent: Wednesday, February 5, 2020 7:04 AM
> To: Vladimir Prokofev <v@xxxxxxxxxxx>; ceph-users@xxxxxxx
> Subject:  Re: Fwd: BlueFS spillover yet again
>
> Hi Vladimir,
>
> there were a plenty of discussions/recommendations around db volume size
> selection here.
>
> In short it's advised to have DB volume of 30 - 64GB for most of use cases.
>
> Thanks,
>
> Igor
>
> On 2/5/2020 4:21 PM, Vladimir Prokofev wrote:
> > Cluster upgraded from 12.2.12 to 14.2.5. All went smooth, except
> > BlueFS spillover warning.
> > We create OSDs with ceph-deploy, command goes like this:
> > ceph-deploy osd create --bluestore --data /dev/sdf --block-db
> > /dev/sdb5 --block-wal /dev/sdb6 ceph-osd3 where block-db and block-wal
> > are SSD partitions.
> > Default ceph-deploy settings created partitions ~1GB which is, of
> > course, too small. So we redeployed OSDs using manually partitioned
> > SSD for block-db/block-wal with sizes of 20G/5G respectively.
> > But now we still get BlueFS spillover warning for redeployed OSDs:
> >       osd.10 spilled over 2.4 GiB metadata from 'db' device (2.8 GiB
> > used of
> > 19 GiB) to slow device
> >       osd.19 spilled over 3.7 GiB metadata from 'db' device (2.7 GiB
> > used of
> > 19 GiB) to slow device
> >       osd.20 spilled over 4.2 GiB metadata from 'db' device (2.6 GiB
> > used of
> > 19 GiB) to slow device
> > osd size is 1.8 TiB.
> >
> > These OSDs are used primarily for RBD as a backup drives, so a lot of
> > snapshots held there. They also have RGW pool assigned to them, but it
> > has no data.
> > I know of sizing recommendations[1] for block-db/block-wal, but I
> > assumed since it's primarily RBD 1%(~20G) should be enough.
> > Also, compaction stats doesn't make sense to me[2]. It states that sum
> > of DB is only 5.08GB, that should be placed on block-db without a
> problem?
> > Am I understanding all this wrong? Should block-db size be greater in
> > my case?
> >
> > [1]
> > https://docs.ceph.com/docs/master/rados/configuration/bluestore-config
> > -ref/#sizing
> > [2] osd.10 logs as an example
> > https://pastebin.com/hC6w6jSn
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx