Just to add, the main reason it seems to make a difference is the metadata
updates which lie on the actual OSD. When you are doing small block writes,
these metadata updates seem to take almost as long as the actual data, so
although the writes are getting coalesced, the actual performance isn't much
better.
I did a blktrace a week ago, writing 500MB in 64k blocks to an OSD. You
could see that the actual data was flushed to the OSD in a couple of
seconds, another 30 seconds was spent writing out metadata and doing
EXT4/XFS journal writes.
Normally I have found flashcache to perform really poorly as it does
everything in 4kb blocks, meaning that when you start throwing larger blocks
at it, it can actually slow things down. However for the purpose of OSD's
you can set the IO cutoff size limit to around 16-32kb and then it should
only cache the metadata updates.
I'm hoping to do some benchmarks before and after flashcache on a SSD
Journaled OSD this week, so will post results when I have them.
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
Brendan Moloney
Sent: 23 March 2015 21:02
To: Noah Mehl
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: [ceph-users] OSD + Flashcache + udev + Partition uuid
This would be in addition to having the journal on SSD. The journal
doesn't
help at all with small random reads and has a fairly limited ability to
coalesce
writes.
In my case, the SSDs we are using for journals should have plenty of
bandwidth/IOPs/space to spare, so I want to see if I can get a little more
out
of them.
-Brendan
________________________________________
From: Noah Mehl [noahmehl@xxxxxxxxxxxxxxxxxx]
Sent: Monday, March 23, 2015 1:45 PM
To: Brendan Moloney
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: [ceph-users] OSD + Flashcache + udev + Partition uuid
We deployed with just putting the journal on an SSD directly, why would
this
not work for you? Just wondering really :)
Thanks!
~Noah
On Mar 23, 2015, at 4:36 PM, Brendan Moloney <moloney@xxxxxxxx>
wrote:
I have been looking at the options for SSD caching for a bit now. Here
is my
take on the current options:
1) bcache - Seems to have lots of reliability issues mentioned on
mailing list
with little sign of improvement.
2) flashcache - Seems to be no longer (or minimally?)
developed/maintained, instead folks are working on the fork enhanceio.
3) enhanceio - Fork of flashcache. Dropped the ability to skip caching
on
sequential writes, which many folks have claimed is important for Ceph OSD
caching performance. (see: https://github.com/stec-
inc/EnhanceIO/issues/32)
4) LVM cache (dm-cache) - There is now a user friendly way to use dm-
cache, through LVM. Allows sequential writes to be skipped. You need a
pretty recent kernel.
I am going to be trying out LVM cache on my own cluster in the next few
weeks. I will share my results here on the mailing list. If anyone else
has
tried it out I would love to hear about it.
-Brendan
In a long term use I also had some issues with flashcache and
enhanceio.
I've noticed frequent slow requests.
Andrei
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com