To optimize for non-direct, sequential IO, you'd actually most likely be better off with smaller RBD object sizes. The rationale is that each backing object is handled by a single PG and by using smaller objects, you can distribute the IO load to more PGs (and associated OSDs) in parallel. The 4MB default object size was somewhat randomly picked to not be too large to reduce parallelism but also not too small to result in FileStore requiring an order of magnitude more files to manage. This is why librbd supports "fancy" stripping to create the illusion of small objects to increase the parallelism for sequential IO. With BlueStore, the eventual hope is that we will be able to reduce the default RBD object size since it *should* more efficiently handle small objects. On Mon, Nov 28, 2016 at 12:20 PM, Francois Blondel <fblondel@xxxxxxxxxxxx> wrote: > Hi *, > > I am currently testing different scenarios to try to optimize sequential > read and write speeds using Kernel RBD. > > I have two block devices created with : > rbd create block1 --size 500G --pool rbd --image-feature layering > rbd create block132m --size 500G --pool rbd --image-feature layering > --object-size 32M > > -> Writing to block1 works quite fine (about 200ops/s, 310MB/s in average, > for a 250GB file) (tests running with dd) > -> Writing to block132m is much slower (about 40MB/s in average), and > generates high ops/s (seen from a ceph -w) (from 4000 to 13000) > > Current test cluster: > > health HEALTH_WARN > noscrub,nodeep-scrub,sortbitwise flag(s) set > monmap e2: 3 mons at > {aac=10.113.49.48:6789/0,aad=10.112.33.36:6789/0,aae=10.112.48.60:6789/0} > election epoch 26, quorum 0,1,2 aad,aae,aac > osdmap e10962: 38 osds: 38 up, 38 in > flags noscrub,nodeep-scrub,sortbitwise > pgmap v120464: 1024 pgs, 1 pools, 486 GB data, 122 kobjects > 4245 GB used, 50571 GB / 54816 GB avail > 1024 active+clean > > The OSDs (using bluestore) have been created using: > ceph-disk prepare --zap-disk --bluestore --cluster ceph --cluster-uuid > XX..XX /dev/sdX > > ceph -v : ceph version 10.2.3 > (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) > > > Does someone have any experience involving "non-standard" RBD "object-size" > ? > > Could this be due to "bluestore", or has someone already encountered that > issue using "filestore" OSDs ? > > Should switching to an higher RBD "object-size" at least theorycally improve > seq r/w speeds ? > > Many thanks, > François > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com