RADOS as a simple object storage

kas@xxxxxxxxxx (Jan Kasprzak) · Mon, 20 Feb 2017 15:46:09 +0100

	Hello, world!\n

I have been using CEPH RBD for a year or so as a virtual machine storage
backend, and I am thinking about moving our another subsystem to CEPH:

The subsystem in question is a simple replicated object storage,
currently implemented by a custom C code by yours truly. My question
is whether implementing such a thing on top of a CEPH RADOS pool and librados
is feasible, and what layout and optimizations would you suggest.

Our object storage indexes object with a numeric ID. The access methods
involve creating, reading and deleting objects. Objects are never modified
in place, they are instead deleted and an object with a new ID is created.
We also keep a hash of an object contents and use it to prevent bit rot
- the objects are scrubbed periodically, and if a checksum mismatch is
discovered, the object is restored from another replica.

Here is some statistics from our biggest instance of the object storage:

objects stored: 100_000_000
< 1024 bytes:    10_000_000
1k-64k bytes:    80_000_000
64k-4M bytes:    10_000_000
4M-256M bytes:    1_000_000
> 256M bytes:        10_000
biggest object:   15 GBytes

Would it be feasible to put 100M to 1G objects as a native RADOS objects
into a single pool? Or should I consider their read-only nature and pack them
to bigger object/pack with metadata stored in a tmap object, and repack
those packed objects periodically as older object get deleted?

I have also considered rados-gw, but it looks like a too big hammer
for my nail :-)

Thanks for your suggestions,

-Yenya

-- 
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
Assuming that OpenSSL is written as carefully as Wietse's own code,
every 1000 lines introduce one additional bug into Postfix."   --TLS_README