rgw: matching small objects to pools with small min_alloc_size

Casey Bodley <cbodley@xxxxxxxxxx> · Wed, 18 Aug 2021 15:38:47 -0400

in the rgw refactoring meetings, we've been discussing ways to improve
space utilization for workloads of mixed object sizes

i think it's worth bring this up in Mark's performance call as well,
to explore other options from the osd/librados perspective

most of our discussion so far has centered around ways to use s3's
storage classes (which rgw maps to different rados pools) as a way to
direct object uploads to an appropriately-configured pool depending on
the object's size. for example, all objects under 1M would be assigned
to a SMALL storage class, while the rest go to LARGE. doing this
directly is tricky, because http requests don't always tell us the
full object size up front. this strategy could also lead to confusion
in s3 applications, because the storage class is a visible part of the
protocol and clients expect to have control over it

you can read more about storage classes and rgw pool placement in
https://docs.ceph.com/en/latest/radosgw/placement/. essentially, each
bucket chooses a 'placement target' on creation, and that placement
target defines which storage classes are available for its object
uploads. each storage class defines the rados pool to use for the
object data. each placement target has a default storage class called
STANDARD which is used for object uploads that don't specify a storage
class. this STANDARD pool is also used to store all of the bucket's
head objects, regardless of their storage class. objects uploaded to
the STANDARD storage class store up to 4MB of data in the head object,
and the rest in tail objects of the same pool. objects uploaded to
other storage classes only store metadata in the head object - all of
their data goes in tail objects in their own pool

in today's call, Yehuda made the observation that for this use case,
it would be ideal to put all head objects in a pool with small
min_alloc_size and all tails in larger-sized pools. this way, even
though we don't necessarily know the full object size up front, we'd
still place all small objects in the correctly-sized pool, with larger
objects spilling over into their own tail pools

this doesn't quite match up with our existing implementation though,
because we put the STANDARD storage class' tail objects in the same
pool as the head objects, and other storage classes only store data in
the tails

so i suggested an additional option to specify a 'head object pool' in
the placement target that's independent of its storage classes. when
specified, all head objects would be written to that pool instead,
along with a configurable amount of data. benefits of this strategy
would be that it preserves the storage class behavior that clients
expect, and enables an optional configuration for a space-optimized
head object pool

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx