The "random" may come from ceph trunks. For RBD, Ceph trunk the image to 4M(default) objects, for Rados bench , it already 4M objects if you didn't set the parameters. So from XFS's view, there are lots of 4M files, in default, with ag!=1 (allocation group, specified during mkfs, default seems to be 32 or more), the files will be spread across the allocation groups, which results some random pattern as you can see from blktrace. AG=1 may works for single client senarios, but should not be that useful for a multi-tenant environment since the access pattern is a mixture of all tenant, shoud be random enough. One thing you may try is set the /sys/block/{disk}/queue/readahead_kb= 1024 or 2048, that should be helpful for sequential read performance. -----Original Message----- From: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Gregory Farnum Sent: Tuesday, August 27, 2013 5:25 AM To: Samuel Just Cc: ceph-users@xxxxxxxxxxxxxx; daniel pol Subject: Re: Sequential placement In addition to that, Ceph uses full data journaling - if you have two journals on the OS drive then you'll be limited to what that OS drive can provide, divided by two (if you have two-copy happening). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Aug 26, 2013 at 2:09 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote: > I think rados bench is actually creating new objects with each IO. > Can you paste in the command you used? > -Sam > > On Tue, Aug 20, 2013 at 7:28 AM, daniel pol <daniel_pol@xxxxxxxxxxx> wrote: >> Hi ! >> >> Ceph newbie here with a placement question. I'm trying to get a >> simple Ceph setup to run well with sequential reads big packets (>256k). >> This is for learning/benchmarking purpose only and the setup I'm >> working with has a single server with 2 data drives, one OSD on each, >> journals on the OS drive, no replication, dumpling release. >> When running rados bench or using a rbd block device the performance >> is only 35%-50% of what the underlying XFS filesystem can do and when >> I look at the IO trace I see random IO going to the physical disk, >> while the IO at ceph layer is sequential. Haven't tested CephFS yet >> but expect similar results there. >> Looking for advice on how to configure Ceph to generate sequential >> read IO pattern to the underlying physical disk. >> >> Have a nice day, >> Dani >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com