Re: bluestore block/db/wal sizing (Was: bluefs-bdev-expand experience)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 12, 2019 at 08:06:49AM -0400, Alfredo Deza wrote:
> On Thu, Apr 11, 2019 at 4:23 PM Yury Shevchuk <sizif@xxxxxxxx> wrote:
[quoting trimmed]
> >
> > I used this (admittedly not very recent) message as a guide for
> > volume sizing:
> >
> >   https://www.spinics.net/lists/ceph-devel/msg37804.html
> >
> > It reads: "1GB for block.wal.  For block.db, as much as you have."
> 
> Hey Yury, I've tried to further clarify this part of configuring
> Bluestore OSDs because "as large as possible" isn't accurate enough
> and ends up raising more questions.
> 
> Have you read the sizing section here?
> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
> 
> block.db should be at least 4% the size of block. So if your block
> device is 1TB your block.db shouldn't be less than 40GB.
> 
> I didn't see any mentions of NVMe or SSDs in your message, if that is
> the case a separate block.db is not required at all, and you can just
> create block and be done with it.
> 
> If I did miss out an SSD mention in this thread, then block.db on the
> fast device is what is recommended, and lastly, no block.wal is
> required unless you have something faster
> than the block.db.
> 
> If the doc link isn't clear enough, I would like to hear about it so I
> can improve it further!

Hi Alfredo,

This phrase in the doc confused me: "Generally, block.db should have
as large as possible logical volumes".  Made me think - the bigger
NVMe/SSD, the happier Ceph will be?  The doc gives no hint at which
point the money invested in SSD stop paying off, no trade-off
explaination.

The following sentence was enlightening:

"If they are on separate devices, then you need to make it as big as
you need to ensure that it won't spill over (or if it does that you're
ok with the degraded performance while the db partition is full)."

The quote is from
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/021030.html
(Thanks to Igor for pointing me in the right direction)

As to separate block/db/wal in my setup that lacks SSD: this is an
attempt to secure a growth path.  If we ever start struggling with OSD
performance, we can add an SSD to the LVM volume group and pvmove(8)
block.db to the SSD without recreating OSD - even without stopping it.
If we start seeing spillover messages
(https://tracker.ceph.com/issues/38745), we can lvextend(8) block.db,
adding another SSD if necessary, then bluefs-bdev-expand.  We can
blktrace(8) each volume separately to identify where bottlenecks are.
This is the point of creating separate block/db/wal even in single
solid drive setup.  However, this all is speculation, we have very
little Ceph experience as of now.

BTW to use pvmove(8) one needs to create all volumes in one LVM volume
group, not a separate vg for each lv as the doc suggests:
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#block-and-block-db
LVs can still be assigned to proper PVs with PhysicalVolumePath arg to
lvcreate(8).

Regards,


-- Yury

[quoting trimmed]
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux