Re: ceph Performance random write is more then sequential

Somnath Roy <Somnath.Roy@xxxxxxxxxxx> · Mon, 2 Feb 2015 05:41:47 +0000

Sumit,
I think random read/write will always outperform sequential read/write in Ceph if we don’t have any kind of cache in front or you have proper striping enabled
 in the image. The reason is the following.

1. If you are trying with the default image option, the object size is 4 MB and the stripe size = 4MB and stripe unit = 1.

2. You didn’t mention your write size, so, if it is less than 4 MB , 2 seq write will always land on a same PG and it will be serialized within the OSD.

3. But, if we have 2 random writes it will always (more probable) to land on different PGs and it will be processed in parallel.

4. Same will happen in case of random vs seq read as well. Increasing read_ahead_kb to a reasonable big number will improve the seq read speed. If you are using
 librbd, rbd_cache will help you both for read/write I guess.

5. Another option you may want to try to set the strip_size/object_size/stripe_unit to your io_size so that seq read/write can land on different object and
 in that case the difference should go away.

Hope this is helpful.

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Sumit Gaur

Sent: Sunday, February 01, 2015 6:55 PM

To: Florent MONTHEL

Cc: ceph-users@xxxxxxxxxxxxxx

Subject: Re: [ceph-users] ceph Performance random write is more then sequential

Hi All,

What I saw after enabling RBD cache it is working as expected, means sequential write has better MBps than random write. can somebody explain this behaviour ? Is RBD cache setting must for ceph cluster to behave normally ?

Thanks

sumit

On Mon, Feb 2, 2015 at 9:59 AM, Sumit Gaur <sumitkgaur@xxxxxxxxx> wrote:

Hi Florent,

Cache tiering , No . 

** Our Architecture :

vdbench/FIO inside VM <--> RBD without cache <-> Ceph Cluster (6 OSDs + 3 Mons) 

Thanks

sumit

[root@ceph-mon01 ~]# ceph -s

    cluster 47b3b559-f93c-4259-a6fb-97b00d87c55a

     health HEALTH_WARN clock skew detected on mon.ceph-mon02, mon.ceph-mon03

     monmap e1: 3 mons at {ceph-mon01=192.168.10.19:6789/0,ceph-mon02=192.168.10.20:6789/0,ceph-mon03=192.168.10.21:6789/0},
 election epoch 14, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03

     osdmap e603: 36 osds: 36 up, 36 in

      pgmap v40812: 5120 pgs, 2 pools, 179 GB data, 569 kobjects

            522 GB used, 9349 GB / 9872 GB avail

                5120 active+clean

On Mon, Feb 2, 2015 at 12:21 AM, Florent MONTHEL <fmonthel@xxxxxxxxxxxxx> wrote:
Hi Sumit

Do you have cache pool tiering activated ?

Some feed-back regarding your architecture ?

Thanks

Sent from my iPad

> On 1 févr. 2015, at 15:50, Sumit Gaur <sumitkgaur@xxxxxxxxx> wrote:

>

> Hi

> I have installed 6 node ceph cluster and to my surprise when I ran rados bench I saw that random write has more performance number then sequential write. This is opposite to normal disk write. Can some body let me know if I am missing any ceph Architecture
 point here ?

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this
 message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy
 any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com