RADOS as a simple object storage

gfarnum@xxxxxxxxxx (Gregory Farnum) · Mon, 20 Feb 2017 09:24:50 -0800

On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak <kas at fi.muni.cz> wrote:
>         Hello, world!\n
>
> I have been using CEPH RBD for a year or so as a virtual machine storage
> backend, and I am thinking about moving our another subsystem to CEPH:
>
> The subsystem in question is a simple replicated object storage,
> currently implemented by a custom C code by yours truly. My question
> is whether implementing such a thing on top of a CEPH RADOS pool and librados
> is feasible, and what layout and optimizations would you suggest.
>
> Our object storage indexes object with a numeric ID. The access methods
> involve creating, reading and deleting objects. Objects are never modified
> in place, they are instead deleted and an object with a new ID is created.
> We also keep a hash of an object contents and use it to prevent bit rot
> - the objects are scrubbed periodically, and if a checksum mismatch is
> discovered, the object is restored from another replica.
>
> Here is some statistics from our biggest instance of the object storage:
>
> objects stored: 100_000_000
> < 1024 bytes:    10_000_000
> 1k-64k bytes:    80_000_000
> 64k-4M bytes:    10_000_000
> 4M-256M bytes:    1_000_000
>> 256M bytes:        10_000
> biggest object:   15 GBytes
>
> Would it be feasible to put 100M to 1G objects as a native RADOS objects
> into a single pool?

This is well outside the object size RADOS is targeted or tested with;
I'd expect issues. You might want to look at libradosstriper from the
requirements you've mentioned.

> Or should I consider their read-only nature and pack them
> to bigger object/pack with metadata stored in a tmap object, and repack
> those packed objects periodically as older object get deleted?

Definitely don't do that, see above. ;)
-Greg