S3 key prefixes and performance impact on Ceph?

malinsk@xxxxxxxxxxxxx · Fri, 22 May 2020 12:38:57 -0000

I've just set up a Ceph cluster and I'm accessing it via object gateway with S3 API.

One thing I don't see documented anywhere is - how does Ceph performance scale with S3 key prefixes?

In AWS S3, performance scales linearly with key prefix (see: https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html). I see the keys as a nested hash table or nodes of a prefix tree, where each prefix is stored in closer proximity at a hardware level - you want to spread reads evenly over prefixes to avoid parallel I/O being concentrated on the same hot spots.

So for example if my access pattern regularly involves scanning data through multiple dates for a single city, this key structure will be more effective: `yyyymmdd/city/data.csv`. Whereas if my access pattern involves scanning through different cities on a single date, `city/yyyymmdd/data.csv` would be more effective.

How about Ceph? Does naming convention of the key prefixes have an effect on Ceph's object gateway performance or does it treat the full object "paths" as a completely flat namespace?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx