Generally on bcache & for that matter lvmcache & dmwriteboost. We did extensive "power off" testing with all of them and reliably managed to break it on our hardware setup. while true; boot box; start writing & stress metadata updates (i.e. make piles of files and unlink them, or you could find something else that's picky about write ordering); let it run for a bit; yank power; power on; This never survived for more than a night without badly corrupting some xfs filesystem. We did the same testing without caching and could not reproduce. This may have been a quirk resulting from our particular setup, I get the impression that others use it and sleep well at night, but I'd recommend testing it under the most unforgivable circumstances you can think of before proceeding. -KJ On Thu, Oct 12, 2017 at 4:54 PM, Jorge Pinilla López <jorpilo@xxxxxxxxx> wrote: > Well, I wouldn't use bcache on filestore at all. > First there are problems with all that you have said and second but way > important you got doble writes (in FS data was written to journal and to > storage disk at the same time), if jounal and data disk were the same then > speed was divided by two getting really bad output. > > In BlueStore things change quite a lot, first there are not double writes > there is no "journal" (well there is a something call Wal but it's not > used in the same way), data goes directly into the data disk and you only > write a few metadata and make a commit into the DB. Rebalancing and scrub go > through a RockDB not a file system making it way more simple and effective, > you aren't supposed to have all the problems that you had with FS. > > In addition, cache tiering has been deprecated on Red Hat Ceph Storage so I > personally wouldn't use something deprecated by developers and support. > > > -------- Mensaje original -------- > De: Marek Grzybowski <marek.grzybowski@xxxxxxxxx> > Fecha: 13/10/17 12:22 AM (GMT+01:00) > Para: Jorge Pinilla López <jorpilo@xxxxxxxxx>, ceph-users@xxxxxxxxxxxxxx > Asunto: Re: using Bcache on blueStore > > On 12.10.2017 20:28, Jorge Pinilla López wrote: >> Hey all! >> I have a ceph with multiple HDD and 1 really fast SSD with (30GB per OSD) >> per host. >> >> I have been thinking and all docs say that I should give all the SSD space >> for RocksDB, so I would have a HDD data and a 30GB partition for RocksDB. >> >> But it came to my mind that if the OSD isnt full maybe I am not using all >> the space in the SSD, or maybe I prefer having a really small amount of hot >> k/v and metadata and the data itself in a really fast device than just >> storing all could metadata. >> >> So I though that using Bcache to make SSD to be a cache and as metadata >> and k/v are usually hot, they should be place on the cache. But this doesnt >> guarantee me that k/v and metadata are actually always in the SSD cause >> under heavy cache loads it can be pushed out (like really big data files). >> >> So I came up with the idea of setting small 5-10GB partitions for the hot >> RocksDB and the rest to use it as a cache, so I make sure that really hot >> metadata is actually always on the SSD and the coulder one should be also on >> the SSD (as a bcache) if its not really freezing, in that case they would be >> pushed to the HDD. It also doesnt make anysense to have metadatada that you >> never used using space on the SSD, I rather use that space to store hotter >> data. >> >> This is also make writes faster, and in blueStore we dont have the double >> write problem so it should work fine. >> >> What do you think about this? does it have any downsite? is there any >> other way? > > Hi Jorge > I was inexperienced and tried bcache on old fsstore once. It was bad. > Mostly because bcache does not have any typical disk scheduling algorithm. > So when scrub or rebalnce was running latency on such storage was very high > and unpredictable. > OSD deamon could not give any ioprio for disks read or writes, and > additionaly > bcache cache was poisoned by scrub/rebalance. > > Fortunately to me, it is very easy to rolling replace OSDs. > I use some SSDs partitions for journal now and what left for pure ssd > storage. > This works really great . > > If i will ever need cache, i will use cache tiering instead . > > > -- > Kind Regards > Marek Grzybowski > > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Kjetil Joergensen <kjetil@xxxxxxxxxxxx> SRE, Medallia Inc _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com