Re: using Bcache on blueStore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Okay I get your point, its way more safer without cache at all.

I am talking from totally ignorace, so please correct me if I say something wrong.

What I dont really understand is how badly is DB space used.

1-When its a new OSD, it might be totally empty but its not used for storing any actual data at all, so writes and reads could be speeded up by using that free space.

2-When OSD is full, maybe you have tones of cold metadata that is never used taking all the space in the SSD. So maybe it wouldnt be a bad idea to push that metadata to the HDD cold DB and trying to bring the actual hot data into the disk so reads (or maybe writes) could be improved. So maybe having a hot ratio on the metadata could be interesing, I know metadata is way more important that the actual data but if metadata is freezing I dont get the value of being using SSD space.

I know blueStore has also System Cache as I metion in this email

https://www.spinics.net/lists/ceph-users/msg39426.html

but also that cache doesnt include any data at all, so its hard for me to understand how bluestore can be so fast if its limited to the HDD speed.

If someone knows how RocksDB SSD actually works, how bluestore works to keep speed up or why the hole metadata should be in a separate SSD please tell me :) I am really trying to understand this topic.

El 14/10/2017 a las 2:39, Kjetil Joergensen escribió:
Generally on bcache & for that matter lvmcache & dmwriteboost.

We did extensive "power off" testing with all of them and reliably
managed to break it on our hardware setup.

while true; boot box; start writing & stress metadata updates (i.e.
make piles of files and unlink them, or you could find something else
that's picky about write ordering); let it run for a bit; yank power;
power on;

This never survived for more than a night without badly corrupting
some xfs filesystem. We did the same testing without caching and could
not reproduce.

This may have been a quirk resulting from our particular setup, I get
the impression that others use it and sleep well at night, but I'd
recommend testing it under the most unforgivable circumstances you can
think of before proceeding.

-KJ

On Thu, Oct 12, 2017 at 4:54 PM, Jorge Pinilla López <jorpilo@xxxxxxxxx> wrote:
Well, I wouldn't use bcache on filestore at all.
First there are problems with all that you have said and second but way
important you got doble writes (in FS data was written to journal and to
storage disk at the same time), if jounal and data disk were the same then
speed was divided by two getting really bad output.

In BlueStore things change quite a lot, first there are not double writes
there is no "journal" (well there is  a something call Wal but  it's not
used in the same way), data goes directly into the data disk and you only
write a few metadata and make a commit into the DB. Rebalancing and scrub go
through a RockDB not a file system making it way more simple and effective,
you aren't supposed to have all the problems that you had with FS.

In addition, cache tiering has been deprecated on Red Hat Ceph Storage so I
personally wouldn't use something deprecated by developers and support.


-------- Mensaje original --------
De: Marek Grzybowski <marek.grzybowski@xxxxxxxxx>
Fecha: 13/10/17 12:22 AM (GMT+01:00)
Para: Jorge Pinilla López <jorpilo@xxxxxxxxx>, ceph-users@xxxxxxxxxxxxxx
Asunto: Re:  using Bcache on blueStore

On 12.10.2017 20:28, Jorge Pinilla López wrote:
Hey all!
I have a ceph with multiple HDD and 1 really fast SSD with (30GB per OSD)
per host.

I have been thinking and all docs say that I should give all the SSD space
for RocksDB, so I would have a HDD data and a 30GB partition for RocksDB.

But it came to my mind that if the OSD isnt full maybe I am not using all
the space in the SSD, or maybe I prefer having a really small amount of hot
k/v and metadata and the data itself in a really fast device than just
storing all could metadata.

So I though that using Bcache to make SSD to be a cache and as metadata
and k/v are usually hot, they should be place on the cache. But this doesnt
guarantee me that k/v and metadata are actually always in the SSD cause
under heavy cache loads it can be pushed out (like really big data files).

So I came up with the idea of setting small 5-10GB partitions for the hot
RocksDB and the rest to use it as a cache, so I make sure that really hot
metadata is actually always on the SSD and the coulder one should be also on
the SSD (as a bcache) if its not really freezing, in that case they would be
pushed to the HDD. It also doesnt make anysense to have metadatada that you
never used using space on the SSD, I rather use that space to store hotter
data.

This is also make writes faster, and in blueStore we dont have the double
write problem so it should work fine.

What do you think about this? does it have any downsite? is there any
other way?
Hi Jorge
  I was inexperienced and tried bcache on old fsstore once. It was bad.
Mostly because bcache does not have any typical disk scheduling algorithm.
So when scrub or rebalnce was running latency on such storage was very high
and unpredictable.
OSD deamon could not give any ioprio for disks read or writes, and
additionaly
bcache cache was poisoned by scrub/rebalance.

Fortunately to me, it is very easy to rolling replace OSDs.
I use some SSDs partitions for journal now and what left for pure ssd
storage.
This works really great .

If i will ever need cache, i will use cache tiering instead .


--
  Kind Regards
    Marek Grzybowski






_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--

Jorge Pinilla López
jorpilo@xxxxxxxxx
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux