Re: Bluestore vs. Filestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02.10.2018 19:28, jesper@xxxxxxxx wrote:
Hi.

Based on some recommendations we have setup our CephFS installation using
bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS
server - 100TB-ish size.

Current setup is - a sizeable Linux host with 512GB of memory - one large
Dell MD1200 or MD1220 - 100TB + a Linux kernel NFS server.

Since our "hot" dataset is < 400GB we can actually serve the hot data
directly out of the host page-cache and never really touch the "slow"
underlying drives. Except when new bulk data are written where a Perc with
BBWC is consuming the data.

In the CephFS + Bluestore world, Ceph is "deliberatly" bypassing the host
OS page-cache, so even when we have 4-5 x 256GB memory** in the OSD hosts
it is really hard to create a synthetic test where they hot data does not
end up being read out of the underlying disks. Yes, the
client side page cache works very well, but in our scenario we have 30+
hosts pulling the same data over NFS.

Is bluestore just a "bad fit" .. Filestore "should" do the right thing? Is
the recommendation to make an SSD "overlay" on the slow drives?

Thoughts?

Jesper

* Bluestore should be the new and shiny future - right?
** Total mem 1TB+




In the cephfs world there is no central server that hold the cache. each cephfs client reads data directly from the osd's.  this also means no single point of failure, and you can scale out performance by spreading metadata tree information over multiple MDS servers. and scale out storage and throughput with added osd nodes.

so if the cephfs client cache is not sufficient, you can look at at the bluestore cache.
http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/#cache-size

or you can look at adding a ssd layer over the spinning disks. with eg  bcache.  I assume you are using a ssd/nvram for bluestore db already

you should also look at tuning the cephfs metadata servers.
make sure the metadata pool is on fast ssd osd's .  and tune the mds cache to the mds server's ram, so you cache as much metadata as possible.

good luck
Ronny Aasen




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux