Hi, The current behavior is effectively that of a flat namespace. As the number of objects in a bucket becomes large, RGW partitions the index, and a hash of the key name is used to place it. Reads on the partitions are done in parallel (unless unordered listing is requested, an RGW extension). Matt On Fri, May 22, 2020 at 8:39 AM <malinsk@xxxxxxxxxxxxx> wrote: > > I've just set up a Ceph cluster and I'm accessing it via object gateway with S3 API. > > One thing I don't see documented anywhere is - how does Ceph performance scale with S3 key prefixes? > > In AWS S3, performance scales linearly with key prefix (see: https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html). I see the keys as a nested hash table or nodes of a prefix tree, where each prefix is stored in closer proximity at a hardware level - you want to spread reads evenly over prefixes to avoid parallel I/O being concentrated on the same hot spots. > > So for example if my access pattern regularly involves scanning data through multiple dates for a single city, this key structure will be more effective: `yyyymmdd/city/data.csv`. Whereas if my access pattern involves scanning through different cities on a single date, `city/yyyymmdd/data.csv` would be more effective. > > How about Ceph? Does naming convention of the key prefixes have an effect on Ceph's object gateway performance or does it treat the full object "paths" as a completely flat namespace? > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx