RE: newstore performance update

"Chen, Xiaoxi" <xiaoxi.chen@xxxxxxxxx> · Thu, 30 Apr 2015 01:08:43 +0000

Hi Mark
        I was seeing 50%...Oh yeah, I go with newstore_aio =  false, maybe aio already exploit the parallelism.
It's interesting here, we have two way to parallel the IOs,
 1.Sync_io(likely use DIO if the request is aligned) with multi WAL thread.  (newstore_aio= false, newstore_sync_wal_apply = false, newstore_wal_threads = N)
 2. asyn IO issue by kv_sync_thread(newstore_aio = true, newstore_sync_wal_apply = true, newstore_wal_threads=whatever, doesn't make sense ),

Do we have any pre knowledge about which way is better on some kind of device? I suspect AIO will be better for HDD while sync_io+multithread will better in SSD.

Xiaoxi

> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Mark Nelson
> Sent: Thursday, April 30, 2015 3:06 AM
> To: Chen, Xiaoxi; kernel neophyte
> Cc: ceph-devel
> Subject: Re: newstore performance update
> 
> Hi Xiaoxi,
> 
> I just tried setting newstore_sync_wal_apply to false, but it seemed to make
> very little difference for me.  How much improvement were you seeing with
> it?
> 
> Mark
> 
> On 04/29/2015 10:55 AM, Chen, Xiaoxi wrote:
> > Hi Mark,
> >         You may miss this tunable:   newstore_sync_wal_apply, which is
> default to true, but would be better to make if false.
> >         If sync_wal_apply is true, WAL apply will be don synchronize (in
> kv_sync_thread) instead of WAL thread. See
> > 	if (g_conf->newstore_sync_wal_apply) {
> > 	  _wal_apply(txc);
> > 	} else {
> > 	  wal_wq.queue(txc);
> > 	}
> >          Tweaking this to false helps a lot in my setup. All other looks good.
> >
> >           And, could you make WAL in a different partition but same SSD as DB?
> Then from IOSTAT -p , we can identify how much writes to DB and how much
> write to WAL. I am always seeing zero in my setup.
> >
> >
> 			Xiaoxi.
> >
> >> -----Original Message-----
> >> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> >> owner@xxxxxxxxxxxxxxx] On Behalf Of Mark Nelson
> >> Sent: Wednesday, April 29, 2015 9:09 PM
> >> To: kernel neophyte
> >> Cc: ceph-devel
> >> Subject: Re: newstore performance update
> >>
> >> Hi,
> >>
> >> ceph.conf file attached.  It's a little ugly because I've been
> >> playing with various parameters.  You'll probably want to enable
> >> debug newstore = 30 if you plan to do any debugging.  Also, the code
> >> has been changing quickly so performance may have changed if you
> haven't tested within the last week.
> >>
> >> Mark
> >>
> >> On 04/28/2015 09:59 PM, kernel neophyte wrote:
> >>> Hi Mark,
> >>>
> >>> I am trying to measure 4k RW performance on Newstore, and I am not
> >>> anywhere close to the numbers you are getting!
> >>>
> >>> Could you share your ceph.conf for these test ?
> >>>
> >>> -Neo
> >>>
> >>> On Tue, Apr 28, 2015 at 5:07 PM, Mark Nelson <mnelson@xxxxxxxxxx>
> >> wrote:
> >>>> Nothing official, though roughly from memory:
> >>>>
> >>>> ~1.7GB/s and something crazy like 100K IOPS for the SSD.
> >>>>
> >>>> ~150MB/s and ~125-150 IOPS for the spinning disk.
> >>>>
> >>>> Mark
> >>>>
> >>>>
> >>>> On 04/28/2015 07:00 PM, Venkateswara Rao Jujjuri wrote:
> >>>>>
> >>>>> Thanks for sharing; newstore numbers look lot better;
> >>>>>
> >>>>> Wondering if we have any base line numbers to put things into
> >> perspective.
> >>>>> like what is it on XFS or on librados?
> >>>>>
> >>>>> JV
> >>>>>
> >>>>> On Tue, Apr 28, 2015 at 4:25 PM, Mark Nelson <mnelson@xxxxxxxxxx>
> >> wrote:
> >>>>>>
> >>>>>> Hi Guys,
> >>>>>>
> >>>>>> Sage has been furiously working away at fixing bugs in newstore
> >>>>>> and improving performance.  Specifically we've been focused on
> >>>>>> write performance as newstore was lagging filestore but quite a
> >>>>>> bit previously.  A lot of work has gone into implementing libaio
> >>>>>> behind the scenes and as a result performance on spinning disks
> >>>>>> with SSD WAL (and SSD backed rocksdb) has improved pretty
> >>>>>> dramatically. It's now often beating filestore:
> >>>>>>
> >>>>>> http://nhm.ceph.com/newstore/newstore-5d96fe6-no_overlay.pdf
> >>>>>>
> >>>>>> On the other hand, sequential writes are slower than random
> >>>>>> writes when the OSD, DB, and WAL are all on the same device be it
> >>>>>> a spinning disk or SSD.
> >>>>>> In this situation newstore does better with random writes and
> >>>>>> sometimes beats filestore (such as in the everything-on-spinning
> >>>>>> disk tests, and when IO sizes are small in the everything-on-ssd
> >>>>>> tests).
> >>>>>>
> >>>>>> Newstore is changing daily so keep in mind that these results are
> >>>>>> almost assuredly going to change.  An interesting area of
> >>>>>> investigation will be why sequential writes are slower than
> >>>>>> random writes, and whether or not we are being limited by rocksdb
> >>>>>> ingest speed and how.
> >>>>>>
> >>>>>> I've also uploaded a quick perf call-graph I grabbed during the "all-
> SSD"
> >>>>>> 32KB sequential write test to see if rocksdb was starving one of
> >>>>>> the cores, but found something that looks quite a bit different:
> >>>>>>
> >>>>>> http://nhm.ceph.com/newstore/newstore-5d96fe6-no_overlay.pdf
> >>>>>>
> >>>>>> Mark
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe
> >>>>>> ceph-devel" in the body of a message to
> majordomo@xxxxxxxxxxxxxxx
> >>>>>> More majordomo info at
> >>>>>> http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> >> majordomo
> >>>> info at  http://vger.kernel.org/majordomo-info.html
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> >> majordomo
> >>> info at  http://vger.kernel.org/majordomo-info.html
> >>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f