Re: Understanding EC properties for CephFS / small files.

jesper@xxxxxxxx · Mon, 18 Feb 2019 07:06:36 +0100

Hi Paul.

Thanks for you comments.

> For your examples:
>
> 16 MB file -> 4x 4 MB objects -> 4x 4x 1 MB data chunks, 4x 2x 1 MB
> coding chunks
>
> 512 kB file -> 1x 512 kB object -> 4x 128 kB data chunks, 2x 128 kb
> coding chunks
>
>
> You'll run into different problems once the erasure coded chunks end
> up being smaller than 64kb each due to bluestore min allocation sizes
> and general metadata overhead making erasure coding a bad fit for very
> small files.

Thanks for the clairification, which makes this a "very bad fit" for CephFS:

# find . -type f -print0 | xargs -0 stat | grep Size | perl -ane '/Size:
(\d+)/; print $1 . "\n";' | ministat -n
x <stdin>
    N           Min           Max        Median           Avg        Stddev
x 12651568             0 1.0840049e+11          9036     2217611.6     
32397960

Gives me 6,3M files < 9036 bytes in size, that'll be stored as 6 x 64KB at
the bluestore
level if I understand it correctly.

We come from a xfs world where default blocksize is 4K so above situation
worked quite nicely. Guess I probably would be way better off with a
RBD with xfs on top to solve this case using Ceph.

Is it fair to summarize your input as:

In a EC4+2 configuration, minimal used space is 256KB+128KB(coding)
regardless of file-size
In a EC8+3 configuraiton, minimal used space is 512KB+192KB(coding)
regardless of file-size

And for the access side:
All access to files in EC pool requires as a minimum IO-requests to
k-shards for the first
bytes to be returned, with fast_read it becomes k+n, but returns when k
has responded.

Any experience with inlining data on the MDS - that would obviously help
here I guess.

Thanks.

-- 
Jesper

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com