Re: Radosgw huge traffic to index bucket compared to incoming requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dnia 2020-06-18, o godz. 10:51:31
Simon Leinen <simon.leinen@xxxxxxxxx> napisał(a):

> Dear Mariusz,
> 
> > we're using Ceph as S3-compatible storage to serve static files
> > (mostly css/js/images + some videos) and I've noticed that there
> > seem to be huge read amplification for index pool.  
> 
> we have observed that too, under Nautilus (14.2.4-14.2.8).
> 
> > Incoming traffic magniture is of around 15k req/sec (mostly sub 1MB
> > request but index pool is getting hammered:  
> 
> > pool pl-war1.rgw.buckets.index id 10
> >   client io 632 MiB/s rd, 277 KiB/s wr, 129.92k op/s rd, 415 op/s
> > wr  
> 
> > pool pl-war1.rgw.buckets.data id 11
> >   client io 4.5 MiB/s rd, 6.8 MiB/s wr, 640 op/s rd, 1.65k op/s wr  
> 
> > and is getting order of magnitude more requests  
> 
> Our hypothesis is that this is due to the way that RadosGW maps bucket
> index queries (ListObjects/ListObjectsV2) to Rados-level operations
> against a *sharded* index.
> 
> For certain types of S3 index queries, the response must be collected
> from multiple (potentially all) shards of the index.
> 
> S3 index queries are always "bounded" by the response limitation (1000
> keys by default). But when your index is distributed over, let's say,
> 2000 shards, RadosGW must collect some data from those 2000 shards,
> then throw away most of what it gets, and return the next 1000 keys.
> This could explain the kind of read amplification that you are seeing.
> 
> (In practice, S3 index queries often use "prefix" and "delimiter" to
> emulate a hierarchical directory structure.  A recently merged change,
> https://github.com/ceph/ceph/pull/30272 , should make such queries
> much more efficient in RadosGW (note that the change contains some
> extensions to the OSD-side Rados protocol).  But if I read it
> correctly, that change is already in the version you are using.)

listing itself is bugged in version
I'm running: https://tracker.ceph.com/issues/45955

But yes, our structure is generally /bucket/prefix/prefix/file so there
is not many big directories (we're migrating from GFS where that was a
problem)

> Paul Emmerich has written about performance issues with large buckets
> on this list, see
> https://lists.ceph.io/hyperkitty/list/dev@xxxxxxx/thread/36P62BOOCJBVVJCVUX5F5J7KYCGAAICV/
> 
> Let's say that there are opportunities for further improvements.
> 
> You could look for the specific queries that cause the high read load
> in your system.  Maybe there's something that can be done on the
> client side.  This could also provide input for Ceph development as
> to what kinds of index operations are used by applications "in the
> wild".  Those might be worth optimizing first :-)

Is there a way to debug which query exactly is causing that ?

Currently there is a lot of incoming traffic (mostly from aws cli sync)
as we're migrating data over but that's at most hundreds of requests
per sec.

> 
> > running 15.2.3, nothing special in terms of tunning aside from
> > disabling some logging as to not overflow the logs.  
> 
> > We've had similar test cluster on 12.x (and way slower hardware)
> > getting similar traffic and haven't observed that magnitude of
> > difference.  
> 
> Was your bucket index sharded in 12.x?

we didn't touch default settings so I assume not ?  "radosgw-admin
metadata get" and "radosgw-admin bucket stat" doesn't say anything
about shards on old cluster, while on new cluster there is from 11 to
few hundred on the biggest buckets.


> 
> > when enabling debug on affected OSD I only get spam of  
> 
> > 2020-06-17T12:35:05.700+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid
> > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> > = 0 2020-06-17T12:35:05.700+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> > 2020-06-17T12:35:05.700+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> > 2020-06-17T12:35:05.700+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> > 2020-06-17T12:35:05.700+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> > 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> > 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> > 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> > 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head#
> > 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid
> > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> > = 0 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> > 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> > 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> > 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> > 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> > 2020-06-17T12:35:05.704+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> > 2020-06-17T12:35:05.708+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> > 2020-06-17T12:35:05.708+0200 7f80694c4700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head#
> > 2020-06-17T12:35:05.716+0200 7f806d4cc700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid
> > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> > = 0 2020-06-17T12:35:05.716+0200 7f806d4cc700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> > 2020-06-17T12:35:05.716+0200 7f806d4cc700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> > 2020-06-17T12:35:05.720+0200 7f806d4cc700 10
> > bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head
> > #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head#
> >  
> 
> Hm, I don't understand enough about the operations that this
> represents, but maybe one of the RadosGW developers can explain why a
> single OSD would perform so many similar requests in such a short
> timeframe.

I'm getting similar logs on any osd/pg that takes part in the .index

Cheers


-- 
Mariusz Gronczewski (XANi) <xani666@xxxxxxxxx>
GnuPG: 0xEA8ACE64
http://devrandom.pl
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux