Re: High ops/s with kRBD and "--object-size 32M"

Jason Dillaman <jdillama@xxxxxxxxxx> · Mon, 28 Nov 2016 12:45:16 -0500

To optimize for non-direct, sequential IO, you'd actually most likely
be better off with smaller RBD object sizes. The rationale is that
each backing object is handled by a single PG and by using smaller
objects, you can distribute the IO load to more PGs (and associated
OSDs) in parallel. The 4MB default object size was somewhat randomly
picked to not be too large to reduce parallelism but also not too
small to result in FileStore requiring an order of magnitude more
files to manage. This is why librbd supports "fancy" stripping to
create the illusion of small objects to increase the parallelism for
sequential IO. With BlueStore, the eventual hope is that we will be
able to reduce the default RBD object size since it *should* more
efficiently handle small objects.

On Mon, Nov 28, 2016 at 12:20 PM, Francois Blondel
<fblondel@xxxxxxxxxxxx> wrote:
> Hi *,
>
> I am currently testing different scenarios to try to optimize sequential
> read and write speeds using Kernel RBD.
>
> I have two block devices created with :
>   rbd create block1 --size 500G --pool rbd --image-feature layering
>   rbd create block132m --size 500G --pool rbd --image-feature layering
> --object-size 32M
>
> -> Writing to block1 works quite fine  (about 200ops/s, 310MB/s in average,
> for a 250GB file) (tests running with dd)
> -> Writing to block132m is much slower (about 40MB/s in average), and
> generates high ops/s (seen from a ceph -w) (from 4000 to 13000)
>
> Current test cluster:
>
>      health HEALTH_WARN
>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>      monmap e2: 3 mons at
> {aac=10.113.49.48:6789/0,aad=10.112.33.36:6789/0,aae=10.112.48.60:6789/0}
>             election epoch 26, quorum 0,1,2 aad,aae,aac
>      osdmap e10962: 38 osds: 38 up, 38 in
>             flags noscrub,nodeep-scrub,sortbitwise
>       pgmap v120464: 1024 pgs, 1 pools, 486 GB data, 122 kobjects
>             4245 GB used, 50571 GB / 54816 GB avail
>                 1024 active+clean
>
> The OSDs (using bluestore) have been created using:
>     ceph-disk prepare --zap-disk --bluestore --cluster ceph --cluster-uuid
> XX..XX  /dev/sdX
>
>     ceph -v :   ceph version 10.2.3
> (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>
>
> Does someone have any experience involving "non-standard" RBD "object-size"
> ?
>
> Could this be due to "bluestore", or has someone already encountered that
> issue using "filestore" OSDs ?
>
> Should switching to an higher RBD "object-size" at least theorycally improve
> seq r/w speeds ?
>
> Many thanks,
> François
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com