Re: Ceph behavior on (lots of) small objects (RGW, RADOS + erasure coding)?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

you are probably running into the bluestore min alloc size which is 64kb on HDDs and 16kb on SSDs. With k=5,m=2 you'd need at least 320kb objects on HDDs or 80kb objects on SSD to use the space efficiently.
Last time I checked these values were fixed on OSD creation and cannot be changed after creation.

It's not necessarily the best idea to store a lot of very small objects in RADOS (or cephfs or rgw), but it really depends on your exact requirements and access pattern.


Paul


2018-06-27 11:32 GMT+02:00 Nicolas Dandrimont <olasd@xxxxxxxxxxxxxxxxxxxx>:
Hi,

I would like to use ceph to store a lot of small objects. Our current usage
pattern is 4.5 billion unique objects, ranging from 0 to 100MB, with a median
size of 3-4kB. Overall, that's around 350 TB of raw data to store, which isn't
much, but that's across a *lot* of tiny files.

We expect a growth pattern of around at third per year, and the object size
distribution to sensibly stay the same (it's been stable for the past three
years, and we don't see that changing).

Our object access pattern is a very simple key -> value store, where the key
happens to be the sha1 of the content we're storing. Any metadata are stored
externally and we really only need a dumb object storage.

Our redundancy requirement is to be able to withstand the loss of 2 OSDs.

After looking at our options for storage in Ceph, I dismissed (perhaps hastily)
RGW for its metadata overhead, and went straight to plain RADOS. I've setup an
erasure coded storage pool, with default settings, with k=5 and m=2 (expecting
a 40% increase in storage use over plain contents).

After storing objects in the pool, I see a storage usage of 700% instead of
140%. My understanding of the erasure code profile docs[1] is that objects that
are below the stripe width (k * stripe_unit, which in my case is 20KB) can't be
chunked for erasure coding, which makes RADOS fall back to plain object
copying, with k+m copies.

[1] http://docs.ceph.com/docs/master/rados/operations/erasure-code-profile/

Is my understanding correct? Does anyone have experience with this kind of
storage workload in Ceph?

If my understanding is correct, I'll end up adding size tiering on my object
storage layer, shuffling objects in two pools with different settings according
to their size. That's not too bad, but I'd like to make sure I'm not completely
misunderstanding something.

Thanks!
--
Nicolas Dandrimont
Backend Engineer, Software Heritage

BOFH excuse #170:
popper unable to process jumbo kernel

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux