Re: Radosgw huge traffic to index bucket compared to incoming requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Mariusz,

> we're using Ceph as S3-compatible storage to serve static files (mostly
> css/js/images + some videos) and I've noticed that there seem to be
> huge read amplification for index pool.

we have observed that too, under Nautilus (14.2.4-14.2.8).

> Incoming traffic magniture is of around 15k req/sec (mostly sub 1MB
> request but index pool is getting hammered:

> pool pl-war1.rgw.buckets.index id 10
>   client io 632 MiB/s rd, 277 KiB/s wr, 129.92k op/s rd, 415 op/s wr

> pool pl-war1.rgw.buckets.data id 11
>   client io 4.5 MiB/s rd, 6.8 MiB/s wr, 640 op/s rd, 1.65k op/s wr

> and is getting order of magnitude more requests

Our hypothesis is that this is due to the way that RadosGW maps bucket
index queries (ListObjects/ListObjectsV2) to Rados-level operations
against a *sharded* index.

For certain types of S3 index queries, the response must be collected
from multiple (potentially all) shards of the index.

S3 index queries are always "bounded" by the response limitation (1000
keys by default). But when your index is distributed over, let's say,
2000 shards, RadosGW must collect some data from those 2000 shards, then
throw away most of what it gets, and return the next 1000 keys.  This
could explain the kind of read amplification that you are seeing.

(In practice, S3 index queries often use "prefix" and "delimiter" to
emulate a hierarchical directory structure.  A recently merged change,
https://github.com/ceph/ceph/pull/30272 , should make such queries much
more efficient in RadosGW (note that the change contains some extensions
to the OSD-side Rados protocol).  But if I read it correctly, that
change is already in the version you are using.)

Paul Emmerich has written about performance issues with large buckets on
this list, see
https://lists.ceph.io/hyperkitty/list/dev@xxxxxxx/thread/36P62BOOCJBVVJCVUX5F5J7KYCGAAICV/

Let's say that there are opportunities for further improvements.

You could look for the specific queries that cause the high read load in
your system.  Maybe there's something that can be done on the client
side.  This could also provide input for Ceph development as to what
kinds of index operations are used by applications "in the wild".  Those
might be worth optimizing first :-)

> running 15.2.3, nothing special in terms of tunning aside from
> disabling some logging as to not overflow the logs.

> We've had similar test cluster on 12.x (and way slower hardware)
> getting similar traffic and haven't observed that magnitude of
> difference.

Was your bucket index sharded in 12.x?

> when enabling debug on affected OSD I only get spam of

> 2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# = 0
> 2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# = 0
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.708+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.708+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> 2020-06-17T12:35:05.716+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# = 0
> 2020-06-17T12:35:05.716+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.716+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#

Hm, I don't understand enough about the operations that this represents,
but maybe one of the RadosGW developers can explain why a single OSD
would perform so many similar requests in such a short timeframe.

Cheers,
-- 
Simon.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux