Re: Guest sync write iops so poor.

Jan Schermer <jan@xxxxxxxxxxx> · Fri, 26 Feb 2016 11:09:25 +0100

Also take a look at Galera cluster. You can relax flushing to disk as long as all your nodes don't go down at the same time.
(And when a node goes back up after a crash you should trash it before it rejoins the cluster)

Jan

> On 26 Feb 2016, at 11:01, Nick Fisk <nick@xxxxxxxxxx> wrote:
> 
> I guess my question was more around what does your final workload look like, if it’s the same as the SQL benchmarks then you are not going to get much better performance than what you do now, aside from trying some of the tuning options I mentioned which might get you an extra 100iops.
> 
> The only other option would be to look at some sort of clientside SSD caching (flashcache.bcahce...etc) of the RBD. These are not ideal, but it might be your only option of getting near local sync write performance.
> 
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Huan Zhang
>> Sent: 26 February 2016 09:30
>> To: Nick Fisk <nick@xxxxxxxxxx>
>> Cc: josh durgin <josh.durgin@xxxxxxxxxxx>; ceph-users <ceph-
>> users@xxxxxxxx>
>> Subject: Re:  Guest sync write iops so poor.
>> 
>> Hi Nick,
>> DB's IO pattern depends on config, mysql for example.
>> innodb_flush_log_at_trx_commit =1, mysql will sync after one transcation.
>> like:
>> write
>> sync
>> wirte
>> sync
>> ...
>> 
>> innodb_flush_log_at_trx_commit = 5,
>> write
>> write
>> write
>> write
>> write
>> sync
>> 
>> innodb_flush_log_at_trx_commit = 0,
>> write
>> write
>> ...
>> one second later.
>> sync.
>> 
>> 
>> may not very accurate, but more or less.
>> We test mysql tps, with nnodb_flush_log_at_trx_commit =1, get very poor
>> performance even if we can reach very high O_DIRECT randwrite iops with
>> fio.
>> 
>> 
>> 
>> 2016-02-26 16:59 GMT+08:00 Nick Fisk <nick@xxxxxxxxxx>:
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>> Of
>>> Huan Zhang
>>> Sent: 26 February 2016 06:50
>>> To: Jason Dillaman <dillaman@xxxxxxxxxx>
>>> Cc: josh durgin <josh.durgin@xxxxxxxxxxx>; Nick Fisk <nick@xxxxxxxxxx>;
>>> ceph-users <ceph-users@xxxxxxxx>
>>> Subject: Re:  Guest sync write iops so poor.
>>> 
>>> rbd engine with fsync=1 seems stuck.
>>> Jobs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>>> 1244d:10h:39m:18s]
>>> 
>>> But fio using /dev/rbd0 sync=1 direct=1 ioengine=libaio iodepth=64, get
>> very
>>> high iops ~35K, similar to direct wirte.
>>> 
>>> I'm confused with that result, IMHO, ceph could just ignore the sync cache
>>> command since it always use sync write to journal, right?
>> 
>> Even if the data is not sync'd to the data storage part of the OSD, the data still
>> has to be written to the journal and this is where the performance limit lies.
>> 
>> The very nature of SDS means that you are never going to achieve the same
>> latency as you do to a local disk as even if the software side introduced no
>> extra latency, just the network latency will severely limit your sync
>> performance.
>> 
>> Do you know the IO pattern the DB's generate? I know you can switch most
>> DB's to flush with O_DIRECT instead of sync, it might be this helps in your
>> case.
>> 
>> Also check out the tech talk from last month about high performance
>> databases on Ceph. The presenter gave the impression that, at least in their
>> case, not every write was a sync IO. So your results could possibly matter less
>> than you think.
>> 
>> Also please search the lists and past presentations about reducing write
>> latency. There are a few things you can do like disabling logging and some
>> kernel parameters to stop the CPU's entering sleep states/reducing
>> frequency. One thing I witnessed that if the Ceph cluster is only running at
>> low queue depths, so it's only generating low cpu load, all the cores on the
>> CPU's throttle themselves down to their lowest speeds, which really hurts
>> latency.
>> 
>>> 
>>> Why we get so bad sync iops, how ceph handle it?
>>> Very appreciated to get your reply!
>>> 
>>> 2016-02-25 22:44 GMT+08:00 Jason Dillaman <dillaman@xxxxxxxxxx>:
>>>> 35K IOPS with ioengine=rbd sounds like the "sync=1" option doesn't
>>> actually
>>>> work. Or it's not touching the same object (but I wonder whether write
>>>> ordering is preserved at that rate?).
>>> 
>>> The fio rbd engine does not support "sync=1"; however, it should support
>>> "fsync=1" to accomplish roughly the same effect.
>>> 
>>> Jason
>> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com