On Mon, Sep 23, 2019 at 10:51 AM Shawn A Kwang <kwangs@xxxxxxx> wrote: > > On 9/23/19 9:38 AM, Robert LeBlanc wrote: > > On Wed, Sep 18, 2019 at 11:47 AM Shawn A Kwang <kwangs@xxxxxxx> wrote: > >> > >> We are planning our ceph architecture and I have a question: > >> > >> How should NVMe drives be used when our spinning storage devices use > >> bluestore: > >> > >> 1. block WAL and DB partitions > >> (https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/) > >> 2. Cache tier > >> (https://docs.ceph.com/docs/nautilus/rados/operations/cache-tiering/) > >> 3. Something else? > >> > >> Hardware- Each node has: > >> 3x 8 TB HDD > >> 1x 450 GB NVMe drive > >> 192 GB RAM > >> 2x Xeon CPUs (24 cores total) > >> > >> I plan to have three OSD daemons running on the node. There are 95 nodes > >> total with the same hardware. > >> > >> Use Case: > >> > >> The plan is create cephfs and use this filesystem to store people's home > >> directories and data. I anticipate more read operations than writes. > >> > >> Regarding cache tiering: The online documentation says cache tiering > >> will often degrade performance. But when I read various threads on this > >> ML there do seem to be people using cache tiering with success. I do see > >> that it is heavily dependent upon one's use-case. In 2019 is there any > >> updated recommendations as to whether to use cache tiering? > >> > >> If there is a third suggestion that people have I would be interested in > >> hearing it. Thanks in advance. > > > > I've had good success when I've been able to hold all the 'hot' data > > for 24 hours in a cache tier. That reduces the amount of data being > > evicted from the tier and being added to the tier such that you reduce > > the penalty from those operations. You can adjust the config (hit > > rate, etc) to help reduce promotions for rarely accessed objects. The > > size of the NVMe drive may best be suited for WAL (I highly recommend > > that for any HDD install) for each OSD, then carve out the rest as an > > SSD pool that you can put the CephFS metadata pool on. I don't think > > you would have a good experience with cache tier at that size. > > However, you know your access patterns far better than I do and it may > > be a good fit. > > Robert, > > I like your idea of partitioning each SSD for bluestore's DB [1], and > then extra space for the cephFS metadata pool. > > [1] Question: You wrote 'WAL', but did you mean block.wal or block.db? > Or both? Yes, DB + WAL is what I meant, sorry. If you specify DB, then WAL will go with it automatically. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx