Re: cache tiering or bluestore partitions

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Mon, 23 Sep 2019 11:24:31 -0700

On Mon, Sep 23, 2019 at 10:51 AM Shawn A Kwang <kwangs@xxxxxxx> wrote:
>
> On 9/23/19 9:38 AM, Robert LeBlanc wrote:
> > On Wed, Sep 18, 2019 at 11:47 AM Shawn A Kwang <kwangs@xxxxxxx> wrote:
> >>
> >> We are planning our ceph architecture and I have a question:
> >>
> >> How should NVMe drives be used when our spinning storage devices use
> >> bluestore:
> >>
> >> 1. block WAL and DB partitions
> >> (https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/)
> >> 2. Cache tier
> >> (https://docs.ceph.com/docs/nautilus/rados/operations/cache-tiering/)
> >> 3. Something else?
> >>
> >> Hardware- Each node has:
> >> 3x 8 TB HDD
> >> 1x 450 GB NVMe drive
> >> 192 GB RAM
> >> 2x Xeon CPUs (24 cores total)
> >>
> >> I plan to have three OSD daemons running on the node. There are 95 nodes
> >> total with the same hardware.
> >>
> >> Use Case:
> >>
> >> The plan is create cephfs and use this filesystem to store people's home
> >> directories and data. I anticipate more read operations than writes.
> >>
> >> Regarding cache tiering: The online documentation says cache tiering
> >> will often degrade performance. But when I read various threads on this
> >> ML there do seem to be people using cache tiering with success. I do see
> >> that it is heavily dependent upon one's use-case. In 2019 is there any
> >> updated recommendations as to whether to use cache tiering?
> >>
> >> If there is a third suggestion that people have I would be interested in
> >> hearing it. Thanks in advance.
> >
> > I've had good success when I've been able to hold all the 'hot' data
> > for 24 hours in a cache tier. That reduces the amount of data being
> > evicted from the tier and being added to the tier such that you reduce
> > the penalty from those operations. You can adjust the config (hit
> > rate, etc) to help reduce promotions for rarely accessed objects. The
> > size of the NVMe drive may best be suited for WAL (I highly recommend
> > that for any HDD install) for each OSD, then carve out the rest as an
> > SSD pool that you can put the CephFS metadata pool on. I don't think
> > you would have a good experience with cache tier at that size.
> > However, you know your access patterns far better than I do and it may
> > be a good fit.
>
> Robert,
>
> I like your idea of partitioning each SSD for bluestore's DB [1], and
> then extra space for the cephFS metadata pool.
>
> [1] Question: You wrote 'WAL', but did you mean block.wal or block.db?
> Or both?

Yes, DB + WAL is what I meant, sorry. If you specify DB, then WAL will
go with it automatically.

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx