On Mon, Nov 28, 2016 at 6:20 PM, Francois Blondel <fblondel@xxxxxxxxxxxx> wrote: > Hi *, > > I am currently testing different scenarios to try to optimize sequential > read and write speeds using Kernel RBD. > > I have two block devices created with : > rbd create block1 --size 500G --pool rbd --image-feature layering > rbd create block132m --size 500G --pool rbd --image-feature layering > --object-size 32M > > -> Writing to block1 works quite fine (about 200ops/s, 310MB/s in average, > for a 250GB file) (tests running with dd) > -> Writing to block132m is much slower (about 40MB/s in average), and > generates high ops/s (seen from a ceph -w) (from 4000 to 13000) > > Current test cluster: > > health HEALTH_WARN > noscrub,nodeep-scrub,sortbitwise flag(s) set > monmap e2: 3 mons at > {aac=10.113.49.48:6789/0,aad=10.112.33.36:6789/0,aae=10.112.48.60:6789/0} > election epoch 26, quorum 0,1,2 aad,aae,aac > osdmap e10962: 38 osds: 38 up, 38 in > flags noscrub,nodeep-scrub,sortbitwise > pgmap v120464: 1024 pgs, 1 pools, 486 GB data, 122 kobjects > 4245 GB used, 50571 GB / 54816 GB avail > 1024 active+clean > > The OSDs (using bluestore) have been created using: > ceph-disk prepare --zap-disk --bluestore --cluster ceph --cluster-uuid > XX..XX /dev/sdX > > ceph -v : ceph version 10.2.3 > (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) > > > Does someone have any experience involving "non-standard" RBD "object-size" > ? > > Could this be due to "bluestore", or has someone already encountered that > issue using "filestore" OSDs ? It's hard to tell without any additional information: dd command, iostat or blktrace, probably some OSD logs as well. A ton of work has gone into bluestore in kraken, mostly on the performance front - jewel bluestore has little in common with the current version. > > Should switching to an higher RBD "object-size" at least theorycally improve > seq r/w speeds ? Well, it really depends on the workload. It may result in an improvement in certain cases, but there are many downsides - RADOS (be it with filestore or bluestore) works much better with smaller objects. I agree with Jason in that you are probably better off with the default. Try experimenting with krbd readahead - bump it to 4M or 8M or even higher and make sure you have a recent kernel on the client machine (4.4 or newer). There were a number of threads on this subject on ceph-users. Search for: single thread sequential kernel rbd readahead, or so. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com