Re: The max single write IOPS on single RBD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you are testing with "iodepth=1", I'd recommend testing with "rbd non blocking aio = false" in your Ceph config file to see if that improves your single-threaded IO performance.

-- 

Jason Dillaman 


----- Original Message -----
> From: "Zhi Zhang" <zhang.david2011@xxxxxxxxx>
> To: "Sage Weil" <sage@xxxxxxxxxxxx>
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Sent: Sunday, December 13, 2015 10:10:58 PM
> Subject: Re: The max single write IOPS on single RBD
> 
> On Fri, Dec 11, 2015 at 9:15 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > On Fri, 11 Dec 2015, Zhi Zhang wrote:
> >> Hi Guys,
> >>
> >> We have a small 4 nodes cluster. Here is the hardware configuration.
> >>
> >> 11 x 300GB SSD, 24 cores, 32GB memory per one node.
> >> all the nodes connected within one 1Gb/s network.
> >>
> >> So we have one Monitor and 44 OSDs for testing kernel RBD IOPS using
> >> fio. Here are the major fio options.
> >>
> >> -direct=1
> >> -rw=randwrite
> >> -ioengine=psync
> >> -size=1000M
> >> -bs=4k
> >> -numjobs=1
> >>
> >> The max IOPS we can achieve for single write (numjobs=1) is close to
> >> 1000. This means each IO from RBD takes 1.x ms.
> >>
> >> >From osd logs, we can also observe most of osd_ops will take 1.x ms,
> >> including op processing, journal writing, replication, etc, before
> >> sending commit back to client.
> >>
> >> The network RTT is around 0.04 ms;
> >> Most osd_ops on primary OSD take around 0.5~0.7 ms, journal write takes
> >> 0.3 ms;
> >> Most osd_repops including writing journal on peer OSD take around 0.5 ms.
> >>
> >> We even tried to modify journal to write page cache only, but didn't
> >> get very significant improvement. Does it mean this is the best result
> >> we can get for single write on single RBD?
> >
> > What version is this?  There have been a few recent changes that will
> > reduce the wall clock time spent preparing/processing a request.  There is
> > still a fair bit of work to do here, though--the theoretical lower bound
> > is the SSD write time + 2x RTT (client <-> primary osd <-> replica osd <->
> > replica ssd).
> >
> 
> Ceph version is 0.94.1 with few backports.
> 
> I already saw some related changes. I will try a newer version and
> keep your guys on the updates.
> 
> Thanks.
> 
> > sage
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux