Re: using Bcache on blueStore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Generally on bcache & for that matter lvmcache & dmwriteboost.

We did extensive "power off" testing with all of them and reliably
managed to break it on our hardware setup.

while true; boot box; start writing & stress metadata updates (i.e.
make piles of files and unlink them, or you could find something else
that's picky about write ordering); let it run for a bit; yank power;
power on;

This never survived for more than a night without badly corrupting
some xfs filesystem. We did the same testing without caching and could
not reproduce.

This may have been a quirk resulting from our particular setup, I get
the impression that others use it and sleep well at night, but I'd
recommend testing it under the most unforgivable circumstances you can
think of before proceeding.

-KJ

On Thu, Oct 12, 2017 at 4:54 PM, Jorge Pinilla López <jorpilo@xxxxxxxxx> wrote:
> Well, I wouldn't use bcache on filestore at all.
> First there are problems with all that you have said and second but way
> important you got doble writes (in FS data was written to journal and to
> storage disk at the same time), if jounal and data disk were the same then
> speed was divided by two getting really bad output.
>
> In BlueStore things change quite a lot, first there are not double writes
> there is no "journal" (well there is  a something call Wal but  it's not
> used in the same way), data goes directly into the data disk and you only
> write a few metadata and make a commit into the DB. Rebalancing and scrub go
> through a RockDB not a file system making it way more simple and effective,
> you aren't supposed to have all the problems that you had with FS.
>
> In addition, cache tiering has been deprecated on Red Hat Ceph Storage so I
> personally wouldn't use something deprecated by developers and support.
>
>
> -------- Mensaje original --------
> De: Marek Grzybowski <marek.grzybowski@xxxxxxxxx>
> Fecha: 13/10/17 12:22 AM (GMT+01:00)
> Para: Jorge Pinilla López <jorpilo@xxxxxxxxx>, ceph-users@xxxxxxxxxxxxxx
> Asunto: Re:  using Bcache on blueStore
>
> On 12.10.2017 20:28, Jorge Pinilla López wrote:
>> Hey all!
>> I have a ceph with multiple HDD and 1 really fast SSD with (30GB per OSD)
>> per host.
>>
>> I have been thinking and all docs say that I should give all the SSD space
>> for RocksDB, so I would have a HDD data and a 30GB partition for RocksDB.
>>
>> But it came to my mind that if the OSD isnt full maybe I am not using all
>> the space in the SSD, or maybe I prefer having a really small amount of hot
>> k/v and metadata and the data itself in a really fast device than just
>> storing all could metadata.
>>
>> So I though that using Bcache to make SSD to be a cache and as metadata
>> and k/v are usually hot, they should be place on the cache. But this doesnt
>> guarantee me that k/v and metadata are actually always in the SSD cause
>> under heavy cache loads it can be pushed out (like really big data files).
>>
>> So I came up with the idea of setting small 5-10GB partitions for the hot
>> RocksDB and the rest to use it as a cache, so I make sure that really hot
>> metadata is actually always on the SSD and the coulder one should be also on
>> the SSD (as a bcache) if its not really freezing, in that case they would be
>> pushed to the HDD. It also doesnt make anysense to have metadatada that you
>> never used using space on the SSD, I rather use that space to store hotter
>> data.
>>
>> This is also make writes faster, and in blueStore we dont have the double
>> write problem so it should work fine.
>>
>> What do you think about this? does it have any downsite? is there any
>> other way?
>
> Hi Jorge
>   I was inexperienced and tried bcache on old fsstore once. It was bad.
> Mostly because bcache does not have any typical disk scheduling algorithm.
> So when scrub or rebalnce was running latency on such storage was very high
> and unpredictable.
> OSD deamon could not give any ioprio for disks read or writes, and
> additionaly
> bcache cache was poisoned by scrub/rebalance.
>
> Fortunately to me, it is very easy to rolling replace OSDs.
> I use some SSDs partitions for journal now and what left for pure ssd
> storage.
> This works really great .
>
> If i will ever need cache, i will use cache tiering instead .
>
>
> --
>   Kind Regards
>     Marek Grzybowski
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Kjetil Joergensen <kjetil@xxxxxxxxxxxx>
SRE, Medallia Inc
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux