Re: Bluestore vs. Filestore

Ronny Aasen <ronny+ceph-users@xxxxxxxx> · Tue, 2 Oct 2018 20:59:11 +0200

On 02.10.2018 19:28, jesper@xxxxxxxx wrote:
Hi.

Based on some recommendations we have setup our CephFS installation using
bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS
server - 100TB-ish size.

Current setup is - a sizeable Linux host with 512GB of memory - one large
Dell MD1200 or MD1220 - 100TB + a Linux kernel NFS server.

Since our "hot" dataset is < 400GB we can actually serve the hot data
directly out of the host page-cache and never really touch the "slow"
underlying drives. Except when new bulk data are written where a Perc with
BBWC is consuming the data.

In the CephFS + Bluestore world, Ceph is "deliberatly" bypassing the host
OS page-cache, so even when we have 4-5 x 256GB memory** in the OSD hosts
it is really hard to create a synthetic test where they hot data does not
end up being read out of the underlying disks. Yes, the
client side page cache works very well, but in our scenario we have 30+
hosts pulling the same data over NFS.

Is bluestore just a "bad fit" .. Filestore "should" do the right thing? Is
the recommendation to make an SSD "overlay" on the slow drives?

Thoughts?

Jesper

* Bluestore should be the new and shiny future - right?
** Total mem 1TB+

In the cephfs world there is no central server that hold the cache. each 
cephfs client reads data directly from the osd's.  this also means no 
single point of failure, and you can scale out performance by spreading 
metadata tree information over multiple MDS servers. and scale out 
storage and throughput with added osd nodes.

so if the cephfs client cache is not sufficient, you can look at at the 
bluestore cache.
http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/#cache-size

or you can look at adding a ssd layer over the spinning disks. with eg  
bcache.  I assume you are using a ssd/nvram for bluestore db already

you should also look at tuning the cephfs metadata servers.
make sure the metadata pool is on fast ssd osd's .  and tune the mds 
cache to the mds server's ram, so you cache as much metadata as possible.

good luck
Ronny Aasen

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com