Ceph behavior on (lots of) small objects (RGW, RADOS + erasure coding)?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I would like to use ceph to store a lot of small objects. Our current usage
pattern is 4.5 billion unique objects, ranging from 0 to 100MB, with a median
size of 3-4kB. Overall, that's around 350 TB of raw data to store, which isn't
much, but that's across a *lot* of tiny files.

We expect a growth pattern of around at third per year, and the object size
distribution to sensibly stay the same (it's been stable for the past three
years, and we don't see that changing).

Our object access pattern is a very simple key -> value store, where the key
happens to be the sha1 of the content we're storing. Any metadata are stored
externally and we really only need a dumb object storage.

Our redundancy requirement is to be able to withstand the loss of 2 OSDs.

After looking at our options for storage in Ceph, I dismissed (perhaps hastily)
RGW for its metadata overhead, and went straight to plain RADOS. I've setup an
erasure coded storage pool, with default settings, with k=5 and m=2 (expecting
a 40% increase in storage use over plain contents).

After storing objects in the pool, I see a storage usage of 700% instead of
140%. My understanding of the erasure code profile docs[1] is that objects that
are below the stripe width (k * stripe_unit, which in my case is 20KB) can't be
chunked for erasure coding, which makes RADOS fall back to plain object
copying, with k+m copies.

[1] http://docs.ceph.com/docs/master/rados/operations/erasure-code-profile/

Is my understanding correct? Does anyone have experience with this kind of
storage workload in Ceph?

If my understanding is correct, I'll end up adding size tiering on my object
storage layer, shuffling objects in two pools with different settings according
to their size. That's not too bad, but I'd like to make sure I'm not completely
misunderstanding something.

Thanks!
-- 
Nicolas Dandrimont
Backend Engineer, Software Heritage

BOFH excuse #170:
popper unable to process jumbo kernel

Attachment: signature.asc
Description: PGP signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux