Re: High ops/s with kRBD and "--object-size 32M"

Ilya Dryomov <idryomov@xxxxxxxxx> · Mon, 28 Nov 2016 20:59:24 +0100

On Mon, Nov 28, 2016 at 6:20 PM, Francois Blondel <fblondel@xxxxxxxxxxxx> wrote:
> Hi *,
>
> I am currently testing different scenarios to try to optimize sequential
> read and write speeds using Kernel RBD.
>
> I have two block devices created with :
>   rbd create block1 --size 500G --pool rbd --image-feature layering
>   rbd create block132m --size 500G --pool rbd --image-feature layering
> --object-size 32M
>
> -> Writing to block1 works quite fine  (about 200ops/s, 310MB/s in average,
> for a 250GB file) (tests running with dd)
> -> Writing to block132m is much slower (about 40MB/s in average), and
> generates high ops/s (seen from a ceph -w) (from 4000 to 13000)
>
> Current test cluster:
>
>      health HEALTH_WARN
>             noscrub,nodeep-scrub,sortbitwise flag(s) set
>      monmap e2: 3 mons at
> {aac=10.113.49.48:6789/0,aad=10.112.33.36:6789/0,aae=10.112.48.60:6789/0}
>             election epoch 26, quorum 0,1,2 aad,aae,aac
>      osdmap e10962: 38 osds: 38 up, 38 in
>             flags noscrub,nodeep-scrub,sortbitwise
>       pgmap v120464: 1024 pgs, 1 pools, 486 GB data, 122 kobjects
>             4245 GB used, 50571 GB / 54816 GB avail
>                 1024 active+clean
>
> The OSDs (using bluestore) have been created using:
>     ceph-disk prepare --zap-disk --bluestore --cluster ceph --cluster-uuid
> XX..XX  /dev/sdX
>
>     ceph -v :   ceph version 10.2.3
> (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>
>
> Does someone have any experience involving "non-standard" RBD "object-size"
> ?
>
> Could this be due to "bluestore", or has someone already encountered that
> issue using "filestore" OSDs ?

It's hard to tell without any additional information: dd command,
iostat or blktrace, probably some OSD logs as well.

A ton of work has gone into bluestore in kraken, mostly on the
performance front - jewel bluestore has little in common with the
current version.

>
> Should switching to an higher RBD "object-size" at least theorycally improve
> seq r/w speeds ?

Well, it really depends on the workload.  It may result in an
improvement in certain cases, but there are many downsides - RADOS (be
it with filestore or bluestore) works much better with smaller objects.

I agree with Jason in that you are probably better off with the
default.  Try experimenting with krbd readahead - bump it to 4M or 8M
or even higher and make sure you have a recent kernel on the client
machine (4.4 or newer).

There were a number of threads on this subject on ceph-users.  Search
for: single thread sequential kernel rbd readahead, or so.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com