Re: Radosgw huge traffic to index bucket compared to incoming requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

huge read amplification for index buckets is unfortunately normal,
complexity of a read request is O(n) where n is the number of objects in
that bucket.
I've worked on many clusters with huge buckets and having 10 gbit/s of
network traffic between the OSDs and radosgw is unfortunately not unusual
when running a lot of listing requests.

The problem is that it needs to read *all* shards for each list request and
the number of shards is the number of objects divided by 100k by default.
It's a little bit better in Octopus, but still not great for huge buckets.

My experiences with building rgw setups for huge (> 200 million) buckets
can be summed up as:

* use *good* NVMe disks for the index bucket (very good experiences with
Samsung 1725a, seen these things do > 50k iops during recovieres)
* it can be beneficial to have a larger number of OSDs handling the load as
huge rocksdb sizes can be a problem; that means it can be better to use the
NVMe disks as DB device for HDDs and put the index pool here than to run a
dedicated NVMe-only pool on very few OSDs
* go for larger shards on large buckets, shard sizes of 300k - 600k are
perfectly fine on fast NVMes (the trade-off here is recovery speed/locked
objects vs. read amplification)

I think the formula shards = bucket_size / 100k shouldn't apply for buckets
with >= 100 million objects; shards should become bigger as the bucket size
increases.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Thu, Jun 18, 2020 at 9:25 AM Mariusz Gronczewski <
mariusz.gronczewski@xxxxxxxxxxxx> wrote:

> Hi,
>
> we're using Ceph as S3-compatible storage to serve static files (mostly
> css/js/images + some videos) and I've noticed that there seem to be
> huge read amplification for index pool.
>
> Incoming traffic magniture is of around 15k req/sec (mostly sub 1MB
> request but index pool is getting hammered:
>
> pool pl-war1.rgw.buckets.index id 10
>   client io 632 MiB/s rd, 277 KiB/s wr, 129.92k op/s rd, 415 op/s wr
>
> pool pl-war1.rgw.buckets.data id 11
>   client io 4.5 MiB/s rd, 6.8 MiB/s wr, 640 op/s rd, 1.65k op/s wr
>
> and is getting order of magnitude more requests
>
> running 15.2.3, nothing special in terms of tunning aside from
> disabling some logging as to not overflow the logs.
>
> We've had similar test cluster on 12.x (and way slower hardware)
> getting similar traffic and haven't observed that magnitude of
> difference.
>
> when enabling debug on affected OSD I only get spam of
>
> 2020-06-17T12:35:05.700+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid
> #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> = 0
> 2020-06-17T12:35:05.700+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.700+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.700+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.700+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid
> #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> = 0
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.708+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.708+0200 7f80694c4700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.716+0200 7f806d4cc700 10
> bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid
> #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> = 0
> 2020-06-17T12:35:05.716+0200 7f806d4cc700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.716+0200 7f806d4cc700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
>
>
>
> --
> Mariusz Gronczewski (XANi) <xani666@xxxxxxxxx>
> GnuPG: 0xEA8ACE64
> http://devrandom.pl
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux