Re: cephfs set_layout - tuning

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 14 Aug 2013 13:47:06 -0700

On Wed, Aug 14, 2013 at 1:38 PM, Kasper Dieter
<dieter.kasper@xxxxxxxxxxxxxx> wrote:
> On Wed, Aug 14, 2013 at 10:17:24PM +0200, Gregory Farnum wrote:
>> On Fri, Aug 9, 2013 at 2:03 AM, Kasper Dieter
>> <dieter.kasper@xxxxxxxxxxxxxx> wrote:
>> > OK,
>> > I found this nice page: http://ceph.com/docs/next/dev/file-striping/
>> > which explains "--stripe_unit --stripe_count --object_size"
>> >
>> > But still I'm not sure about
>> > (1) what is the equivalent command on cephfs to 'rbd create --order 16' ?
>>
>> There's not a direct one; CephFS lets you specify arbitrary sizes
>> (--stripe-unit) while rbd restricts you to powers of two. If you want
>> a new file to use a 64KB object size you can just set the object_size
>> to be 64KB.
>>
>> > (2) how to use those parameters to achieve different optimized layouts on CephFS directories
>> >     (e.g. for streaming, small sequential IOs, small random IOs)
>>
>> If (as Yan suspects) you mean specifying how the directory is laid out
>> on disk, you can't ? CephFS directories aren't maintained that way and
>> it wouldn't make any sense. If you're talking about making all the
>> files underneath it use a new layout, you can specify a directory
>> layout which is applied to all new descendent files the same way as
>> you specify the layout on an individual file.
> Thank you Greg,
>
> my question was which parameters of "--stripe_unit --stripe_count --object_size"
> would be optimal for new descendent files under directories
> /mnt/cephfs/streaming
> /mnt/cephfs/seq-IOs
> /mnt/cephfs/rand-IOs
>
> e.g.
> cephfs /mnt/cephfs/streaming set_layout -p 3 -s 4194304 -u 4194304 -c 1
> cephfs /mnt/cephfs/seq-IOs   set_layout -p 3 -s 4194304 -u   65536 -c 8
> cephfs /mnt/cephfs/rand-IOs  set_layout -p 3 -s   65536 -u   65536 -c 1

Ah. That will depend a lot on what your specific usage scenario looks
like. The stripe unit is going to cap the size of an individual IO, so
for large sequential IOs you'll want that to be large. The stripe
count determines how many objects are involved over a specific number
of stripes (eg, 64KB stripe units with a stripe count of 10 means the
first 640KB of a file will all be on separate objects, before wrapping
around to the first one).
You might find that under certain benchmarking patterns your
sequential IO will go up if you use smaller stripe units and stripe
them across many objects, but if you've got a writeback cache in the
way I suspect it will be fairly pointless since the cache can
aggregate those into a single larger IO (which is preferable).
For random IO you probably (depending on your macro workload) want to
use smaller stripe units with a fairly wide stripe count, but perhaps
increase the size of the objects (reducing the number of inodes the
OSDs need to keep track of).

But really you just need to experiment; the aggregate performance of
different workloads against different striping policies is still not a
very well-researched area in Ceph or elsewhere.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html