Re: Guest sync write iops so poor.

Huan Zhang <huan.zhang.jn@xxxxxxxxx> · Fri, 26 Feb 2016 17:49:10 +0800

fio /dev/rbd0 sync=1 has no problem.
Doesn't find 'sync cache code' in linux rbd block driver and radosgw api. 
Seems sync cache is just the concept of librbd (for rbd cache). 
Just my concerns.

2016-02-26 17:30 GMT+08:00 Huan Zhang <huan.zhang.jn@xxxxxxxxx>:
Hi Nick,
DB's IO pattern depends on config, mysql for example.
innodb_flush_log_at_trx_commit =1, mysql will sync after one transcation. like:
write
sync
wirte
sync
...

innodb_flush_log_at_trx_commit = 5,
write
write
write
write
write
sync

innodb_flush_log_at_trx_commit = 0,
write
write
...
one second later.
sync.

may not very accurate, but more or less.

We test mysql tps, with nnodb_flush_log_at_trx_commit =1, get very poor performance even if we can reach very high O_DIRECT randwrite iops with fio.

2016-02-26 16:59 GMT+08:00 Nick Fisk <nick@xxxxxxxxxx>:
> -----Original Message-----

> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of

> Huan Zhang

> Sent: 26 February 2016 06:50

> To: Jason Dillaman <dillaman@xxxxxxxxxx>

> Cc: josh durgin <josh.durgin@xxxxxxxxxxx>; Nick Fisk <nick@xxxxxxxxxx>;

> ceph-users <ceph-users@xxxxxxxx>

> Subject: Re:  Guest sync write iops so poor.

>

> rbd engine with fsync=1 seems stuck.

> Jobs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta

> 1244d:10h:39m:18s]

>

> But fio using /dev/rbd0 sync=1 direct=1 ioengine=libaio iodepth=64, get very

> high iops ~35K, similar to direct wirte.

>

> I'm confused with that result, IMHO, ceph could just ignore the sync cache

> command since it always use sync write to journal, right?

Even if the data is not sync'd to the data storage part of the OSD, the data still has to be written to the journal and this is where the performance limit lies.

The very nature of SDS means that you are never going to achieve the same latency as you do to a local disk as even if the software side introduced no extra latency, just the network latency will severely limit your sync performance.

Do you know the IO pattern the DB's generate? I know you can switch most DB's to flush with O_DIRECT instead of sync, it might be this helps in your case.

Also check out the tech talk from last month about high performance databases on Ceph. The presenter gave the impression that, at least in their case, not every write was a sync IO. So your results could possibly matter less than you think.

Also please search the lists and past presentations about reducing write latency. There are a few things you can do like disabling logging and some kernel parameters to stop the CPU's entering sleep states/reducing frequency. One thing I witnessed that if the Ceph cluster is only running at low queue depths, so it's only generating low cpu load, all the cores on the CPU's throttle themselves down to their lowest speeds, which really hurts latency.

>

> Why we get so bad sync iops, how ceph handle it?

> Very appreciated to get your reply!

>

> 2016-02-25 22:44 GMT+08:00 Jason Dillaman <dillaman@xxxxxxxxxx>:

> > 35K IOPS with ioengine=rbd sounds like the "sync=1" option doesn't

> actually

> > work. Or it's not touching the same object (but I wonder whether write

> > ordering is preserved at that rate?).

>

> The fio rbd engine does not support "sync=1"; however, it should support

> "fsync=1" to accomplish roughly the same effect.

>

> Jason

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com