Re: Investigating Config Error, 300x reduction in IOPs performance on RGW layer

Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Thu, 18 Jul 2019 10:55:40 +0200

Hi,

On 7/18/19 8:57 AM, Ravi Patel wrote:
We’ve been debugging this a while.  The data pool was originally EC 
backed with the bucket indexes on HDD pools. Moving the metadata to 
SSD backed pools improved usability and consistency and the change 
from EC to replicated improved the rados layer iops by 4x, but didn't 
seem to affect rgw IOPS performance very much. Based on that I think 
there is a configuration setup error somewhere.

We can try it but not sure that the hardware is the bottleneck.

It would be good to understand if there is any performance counters or 
metrics we should be looking at to see where the issue might be.

Just my 2 ct:

What kind of authentication do you use within RGW? Local authentication 
(based on username/password stored in RGW metadata), Keystone or LDAP? 
If you do not use local authentication, each request has to validated 
against an external source. In case of keystone this means the RGW has 
to send the requests and authentication information to RGW for 
validation. It does not have access to the plaintext password/secret 
key. This adds an extra round trip for each request.

If this upcall is using a SSL/TLS based connection, you might even need 
to do a complete handshake for each upcall (not sure whether RGW is 
using keep-alive and persistent connections in this case, maybe a 
developer can comment?).

For local authentication I'm also not sure whether the metadata is 
cached, which will require another round trip to ceph for retrieving the 
password.

If you are using keystone, you can test this by creating a local user + 
bucket, and benchmark that account vs. a keystone based account

Regards,

Burkhard

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com