Re: Bluestore vs. Filestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

this has crept up before, find my thread 
"Bluestore caching, flawed by design?" for starters, if you haven't
already.

I'll have to build a new Ceph cluster next year and am also less than
impressed with the choices at this time:

1. Bluestore is the new shiny, filestore is going to die (and did already
with regards to the MUCH better in my experience and use case for EXT4
compared to XFS). Never mind the very bleeding edge vibe I'm still getting
from bluestore two major releases in.

2. Caching is currently not up to snuff. 
To get the same level of caching as with pagecache AND being safe in
heavy recovery/rebalance situations one needs a lot more RAM, 30% at least
would be my guess. 

3. Small IOPS (for my use case) can't be guaranteed to have similar
behavior as with filestore and journals (some IOPS will go to disk
directly).

4. Cache tiers are deprecated. 
Despite working beautifully for my use case and probably for yours as well.
The proposed alternatives leave me cold, I tried them all a year ago.
LVM dm-cache was a nightmare to configure (documentation and complexity)
and performed abysmally compared to bcache. 
While bcache is fast (using it on 2 non-Ceph systems) it has other issues,
some load spikes (often at near idle times), can be crashed by querying
its sysfs counters and doesn't honor IO priorities...


In short, if you're willing to basically treat your Ceph cluster as an
appliance that cant be disposed off after 5 years instead of a
continuously upgraded and expanded storage solution, you can do just that
and install filestore OSDs and/or cache-tiering now and never upgrade (at
least to a version where it becomes unsupported).


In my case I'm currently undecided between doing something like the above
or go all SSD (if that's affordable, maybe the RAM savings will help) and
thus bypass all the bluestore performance issues at least.

Regards,

Christian

On Tue, 2 Oct 2018 19:28:13 +0200 jesper@xxxxxxxx wrote:

> Hi.
> 
> Based on some recommendations we have setup our CephFS installation using
> bluestore*. We're trying to get a strong replacement for "huge" xfs+NFS
> server - 100TB-ish size.
> 
> Current setup is - a sizeable Linux host with 512GB of memory - one large
> Dell MD1200 or MD1220 - 100TB + a Linux kernel NFS server.
> 
> Since our "hot" dataset is < 400GB we can actually serve the hot data
> directly out of the host page-cache and never really touch the "slow"
> underlying drives. Except when new bulk data are written where a Perc with
> BBWC is consuming the data.
> 
> In the CephFS + Bluestore world, Ceph is "deliberatly" bypassing the host
> OS page-cache, so even when we have 4-5 x 256GB memory** in the OSD hosts
> it is really hard to create a synthetic test where they hot data does not
> end up being read out of the underlying disks. Yes, the
> client side page cache works very well, but in our scenario we have 30+
> hosts pulling the same data over NFS.
> 
> Is bluestore just a "bad fit" .. Filestore "should" do the right thing? Is
> the recommendation to make an SSD "overlay" on the slow drives?
> 
> Thoughts?
> 
> Jesper
> 
> * Bluestore should be the new and shiny future - right?
> ** Total mem 1TB+
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux