Hi, huge read amplification for index buckets is unfortunately normal, complexity of a read request is O(n) where n is the number of objects in that bucket. I've worked on many clusters with huge buckets and having 10 gbit/s of network traffic between the OSDs and radosgw is unfortunately not unusual when running a lot of listing requests. The problem is that it needs to read *all* shards for each list request and the number of shards is the number of objects divided by 100k by default. It's a little bit better in Octopus, but still not great for huge buckets. My experiences with building rgw setups for huge (> 200 million) buckets can be summed up as: * use *good* NVMe disks for the index bucket (very good experiences with Samsung 1725a, seen these things do > 50k iops during recovieres) * it can be beneficial to have a larger number of OSDs handling the load as huge rocksdb sizes can be a problem; that means it can be better to use the NVMe disks as DB device for HDDs and put the index pool here than to run a dedicated NVMe-only pool on very few OSDs * go for larger shards on large buckets, shard sizes of 300k - 600k are perfectly fine on fast NVMes (the trade-off here is recovery speed/locked objects vs. read amplification) I think the formula shards = bucket_size / 100k shouldn't apply for buckets with >= 100 million objects; shards should become bigger as the bucket size increases. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Thu, Jun 18, 2020 at 9:25 AM Mariusz Gronczewski < mariusz.gronczewski@xxxxxxxxxxxx> wrote: > Hi, > > we're using Ceph as S3-compatible storage to serve static files (mostly > css/js/images + some videos) and I've noticed that there seem to be > huge read amplification for index pool. > > Incoming traffic magniture is of around 15k req/sec (mostly sub 1MB > request but index pool is getting hammered: > > pool pl-war1.rgw.buckets.index id 10 > client io 632 MiB/s rd, 277 KiB/s wr, 129.92k op/s rd, 415 op/s wr > > pool pl-war1.rgw.buckets.data id 11 > client io 4.5 MiB/s rd, 6.8 MiB/s wr, 640 op/s rd, 1.65k op/s wr > > and is getting order of magnitude more requests > > running 15.2.3, nothing special in terms of tunning aside from > disabling some logging as to not overflow the logs. > > We've had similar test cluster on 12.x (and way slower hardware) > getting similar traffic and haven't observed that magnitude of > difference. > > when enabling debug on affected OSD I only get spam of > > 2020-06-17T12:35:05.700+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# > = 0 > 2020-06-17T12:35:05.700+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# > 2020-06-17T12:35:05.700+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# > 2020-06-17T12:35:05.700+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# > 2020-06-17T12:35:05.700+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# > = 0 > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# > 2020-06-17T12:35:05.704+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# > 2020-06-17T12:35:05.708+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# > 2020-06-17T12:35:05.708+0200 7f80694c4700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# > 2020-06-17T12:35:05.716+0200 7f806d4cc700 10 > bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# > = 0 > 2020-06-17T12:35:05.716+0200 7f806d4cc700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# > 2020-06-17T12:35:05.716+0200 7f806d4cc700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# > > > > -- > Mariusz Gronczewski (XANi) <xani666@xxxxxxxxx> > GnuPG: 0xEA8ACE64 > http://devrandom.pl > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx