Re: Bluestore vs. Filestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I would never ever start a new cluster with Filestore nowadays. Sure,
there are a few minor issues with Bluestore like that it currently
requires some manual configuration for the cache. But overall,
Bluestore is so much better.

Your use case sounds it might profit from the rados cache tier
feature. It's a rarely used feature because it only works in very
specific circumstances. But your scenario sounds like it might work.
Definitely worth giving it a try. Also, dm-cache with LVM *might*
help.
But if your active working set is really just 400GB: Bluestore cache
should handle this just fine. Don't worry about "unequal"
distribution, every 4mb chunk of every file will go to a random OSD.

One very powerful and simple optimization is moving the metadata pool
to SSD only. Even if it's just 3 small but fast SSDs; that can make a
huge difference to how fast your filesystem "feels".


Paul



Am Mi., 3. Okt. 2018 um 11:49 Uhr schrieb John Spray <jspray@xxxxxxxxxx>:
>
> On Tue, Oct 2, 2018 at 6:28 PM <jesper@xxxxxxxx> wrote:
> >
> > Hi.
> >
> > Based on some recommendations we have setup our CephFS installation using
> > bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS
> > server - 100TB-ish size.
> >
> > Current setup is - a sizeable Linux host with 512GB of memory - one large
> > Dell MD1200 or MD1220 - 100TB + a Linux kernel NFS server.
> >
> > Since our "hot" dataset is < 400GB we can actually serve the hot data
> > directly out of the host page-cache and never really touch the "slow"
> > underlying drives. Except when new bulk data are written where a Perc with
> > BBWC is consuming the data.
> >
> > In the CephFS + Bluestore world, Ceph is "deliberatly" bypassing the host
> > OS page-cache, so even when we have 4-5 x 256GB memory** in the OSD hosts
> > it is really hard to create a synthetic test where they hot data does not
> > end up being read out of the underlying disks. Yes, the
> > client side page cache works very well, but in our scenario we have 30+
> > hosts pulling the same data over NFS.
>
> Are you finding that the OSDs use lots of memory but you're still
> hitting disk, or just that the OSDs aren't using up all the available
> memory?  Unlike the page cache, the OSDs will not use all the memory
> in your system by default, you have to tell them how much to use
> (http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/#cache-size)
>
> John
>
> > Is bluestore just a "bad fit" .. Filestore "should" do the right thing? Is
> > the recommendation to make an SSD "overlay" on the slow drives?
> >
> > Thoughts?
> >
> > Jesper
> >
> > * Bluestore should be the new and shiny future - right?
> > ** Total mem 1TB+
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux