On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak <kas at fi.muni.cz> wrote: > Hello, world!\n > > I have been using CEPH RBD for a year or so as a virtual machine storage > backend, and I am thinking about moving our another subsystem to CEPH: > > The subsystem in question is a simple replicated object storage, > currently implemented by a custom C code by yours truly. My question > is whether implementing such a thing on top of a CEPH RADOS pool and librados > is feasible, and what layout and optimizations would you suggest. > > Our object storage indexes object with a numeric ID. The access methods > involve creating, reading and deleting objects. Objects are never modified > in place, they are instead deleted and an object with a new ID is created. > We also keep a hash of an object contents and use it to prevent bit rot > - the objects are scrubbed periodically, and if a checksum mismatch is > discovered, the object is restored from another replica. > > Here is some statistics from our biggest instance of the object storage: > > objects stored: 100_000_000 > < 1024 bytes: 10_000_000 > 1k-64k bytes: 80_000_000 > 64k-4M bytes: 10_000_000 > 4M-256M bytes: 1_000_000 >> 256M bytes: 10_000 > biggest object: 15 GBytes > > Would it be feasible to put 100M to 1G objects as a native RADOS objects > into a single pool? This is well outside the object size RADOS is targeted or tested with; I'd expect issues. You might want to look at libradosstriper from the requirements you've mentioned. > Or should I consider their read-only nature and pack them > to bigger object/pack with metadata stored in a tmap object, and repack > those packed objects periodically as older object get deleted? Definitely don't do that, see above. ;) -Greg