Mariusz Gronczewski writes: > listing itself is bugged in version > I'm running: https://tracker.ceph.com/issues/45955 Ouch! Are your OSDs all running the same version as your RadosGW? The message looks a bit as if your RadosGW might be a newer version than the OSDs, and the new optimized bucket list operation was missing the new extensions to the client<->OSD protocol. > But yes, our structure is generally /bucket/prefix/prefix/file so there > is not many big directories (we're migrating from GFS where that was a > problem) >> Paul Emmerich has written about performance issues with large buckets >> on this list, see >> https://lists.ceph.io/hyperkitty/list/dev@xxxxxxx/thread/36P62BOOCJBVVJCVUX5F5J7KYCGAAICV/ >> >> Let's say that there are opportunities for further improvements. >> >> You could look for the specific queries that cause the high read load >> in your system. Maybe there's something that can be done on the >> client side. This could also provide input for Ceph development as >> to what kinds of index operations are used by applications "in the >> wild". Those might be worth optimizing first :-) > Is there a way to debug which query exactly is causing that ? What I usually do is grep through the HTTP request logs of the front-end proxy/load balancer (Nginx in our case), and look for GET requests on a bucket that have a long duration. It's a bit crude, I know. (If someone knows better techniques for this, I'd also be interested! Maybe something based on something like Jaeger/OpenTracing, or clever log correlation?) > Currently there is a lot of incoming traffic (mostly from aws cli sync) > as we're migrating data over but that's at most hundreds of requests > per sec. >> >> > running 15.2.3, nothing special in terms of tunning aside from >> > disabling some logging as to not overflow the logs. >> >> > We've had similar test cluster on 12.x (and way slower hardware) >> > getting similar traffic and haven't observed that magnitude of >> > difference. >> >> Was your bucket index sharded in 12.x? > we didn't touch default settings so I assume not ? "radosgw-admin > metadata get" and "radosgw-admin bucket stat" doesn't say anything > about shards on old cluster, while on new cluster there is from 11 to > few hundred on the biggest buckets. Yes, I think it's the sharding that causes the read amplification. >> Hm, I don't understand enough about the operations that this >> represents, but maybe one of the RadosGW developers can explain why a >> single OSD would perform so many similar requests in such a short >> timeframe. > I'm getting similar logs on any osd/pg that takes part in the .index Right, that's what I thought. Again, I can't tell whether these log messages are to be expected... the repetitions look a bit odd. Best regards, -- Simon. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx