Re: Ceph behavior on (lots of) small objects (RGW, RADOS + erasure coding)?

Nicolas Dandrimont <olasd@xxxxxxxxxxxxxxxxxxxx> · Wed, 4 Jul 2018 14:49:57 +0200

Hi!

* Gregory Farnum <gfarnum@xxxxxxxxxx> [2018-06-28 19:31:09 -0700]:

> That’s close but not *quite* right. It’s not that Ceph will explicitly
> “fall back” to replication. In most (though perhaps not all) erasure codes,
> what you’ll see is full sized parity blocks, a full store of the data (in
> the default reed-Solomon that will just be full-sized chunks up to however
> many are needed to store it fully in a single copy), and the remaining data
> chunks (out of the k) will have no data. *But* Ceph will keep the “object
> info” metadata in each shard, so all the OSDs in a PG will still witness
> all the writes.

That makes sense.

To make sure this is what's actually happening, and combining this insight with
the info Paul Emmerich gave about bluestore min alloc size, I've done an
analysis of the number of stripes my objects would take, and how much space
usage that would incur.

So far, I've loaded 92.4 million objects, and I've done the analysis over (a
random sample of) 10 million objects.

Counting the number of 5*64k stripes each object would take, and multiplying
that by 1.4, I get a space usage estimation that's within a percent of the
actual usage, so, yeah, seems that you're both right. Ouch :-)

> > If my understanding is correct, I'll end up adding size tiering on my
> > object
> > storage layer, shuffling objects in two pools with different settings
> > according
> > to their size. That's not too bad, but I'd like to make sure I'm not
> > completely
> > misunderstanding something.
> >
> 
> That’s probably a reasonable response, especially if you are already
> maintaining an index for other purposes!

I guess that's back to the drawing board for us, because the 64k minimal
allocation will also happen on basic replicated pools, and we can store a _lot_
of objects in a 64k block ;)

Thanks for your insights,
-- 
Nicolas Dandrimont

BOFH excuse #259:
Someone's tie is caught in the printer, and if anything else gets printed, he'll be in it too.
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com