Re: using Bcache on blueStore

Jorge Pinilla López <jorpilo@xxxxxxxxx> · Fri, 13 Oct 2017 01:54:55 +0200

Well, I wouldn't use bcache on filestore at all.First there are problems with all that you have said and second but way important you got doble writes (in FS data was written to journal and to storage disk at the same time), if jounal and data disk were the same then speed was divided by two getting really bad output.

In BlueStore things change quite a lot, first there are not double writes there is no "journal" (well there is  a something call Wal but  it's not used in the same way), data goes directly into the data disk and you only write a few metadata and make a commit into the DB. Rebalancing and scrub go through a RockDB not a file system making it way more simple and effective, you aren't supposed to have all the problems that you had with FS.

In addition, cache tiering has been deprecated on Red Hat Ceph Storage so I personally wouldn't use something deprecated by developers and support.

-------- Mensaje original --------
De: Marek Grzybowski <marek.grzybowski@xxxxxxxxx> 
Fecha: 13/10/17  12:22 AM  (GMT+01:00) 
Para: Jorge Pinilla López <jorpilo@xxxxxxxxx>, ceph-users@xxxxxxxxxxxxxx 
Asunto: Re: [ceph-users] using Bcache on blueStore 

On 12.10.2017 20:28, Jorge Pinilla López wrote:
> Hey all!
> I have a ceph with multiple HDD and 1 really fast SSD with (30GB per OSD) per host.
> 
> I have been thinking and all docs say that I should give all the SSD space for RocksDB, so I would have a HDD data and a 30GB partition for RocksDB.
> 
> But it came to my mind that if the OSD isnt full maybe I am not using all the space in the SSD, or maybe I prefer having a really small amount of hot k/v and metadata and the data itself in a really fast device than just storing all could metadata.
> 
> So I though that using Bcache to make SSD to be a cache and as metadata and k/v are usually hot, they should be place on the cache. But this doesnt guarantee me that k/v and metadata are actually always in the SSD cause under heavy cache loads it can be pushed out (like really big data files).
> 
> So I came up with the idea of setting small 5-10GB partitions for the hot RocksDB and the rest to use it as a cache, so I make sure that really hot metadata is actually always on the SSD and the coulder one should be also on the SSD (as a bcache) if its not really freezing, in that case they would be pushed to the HDD. It also doesnt make anysense to have metadatada that you never used using space on the SSD, I rather use that space to store hotter data.
> 
> This is also make writes faster, and in blueStore we dont have the double write problem so it should work fine.
> 
> What do you think about this? does it have any downsite? is there any other way?

Hi Jorge
  I was inexperienced and tried bcache on old fsstore once. It was bad.
Mostly because bcache does not have any typical disk scheduling algorithm.
So when scrub or rebalnce was running latency on such storage was very high and unpredictable.
OSD deamon could not give any ioprio for disks read or writes, and additionaly
bcache cache was poisoned by scrub/rebalance.

Fortunately to me, it is very easy to rolling replace OSDs.
I use some SSDs partitions for journal now and what left for pure ssd storage.
This works really great .

If i will ever need cache, i will use cache tiering instead .

-- 
  Kind Regards
    Marek Grzybowski

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com