It seems plausible to guess that you've hit some behavior that's O(N^2)
in the number of objects (for some object type or other). Perhaps "perf"
or a similar tool would give some insight into where the bottleneck is.
https://wiki.postgresql.org/wiki/Profiling_with_perf
Thanks for your quick reply!
I haven't used "perf" yet and decided to investigate a bit further with the tools I am more familiar with:
As I mentioned in my other post, I use the combination of "SET ROLE ...;" and "SET search_path = ...;" to switch to the requested tenant before executing actual statements. A typical request to my webapp executes the following sql statements:
1. SET ROLE tenant1;
2. SET search_path = tenant1;
3. -- execute various tenant1 related sql statements here
4. SET search_path = DEAULT;
5. RESET ROLE;
I activated logging of all statements for around 6 minutes in production and analyzed the duration of parse, bind and execute for the statements 1, 2, 4 and 5 above. I just summed parse, bind and execute and calculated the average of them.
"SET ROLE ...;" -> 7.109 ms (!)
"SET search_path = ...;" -> 0.026 ms
"SET search_path = DEAULT;" -> 0.059 ms
"RESET ROLE;" -> 0.026 ms
So "SET ROLE ...;" is more than 260 times slower than "SET search_path = ...;"! 7.109 vs. 0.026 ms.
I was curious to see what happens when I change the order of statements as follows ("SET ROLE ...;" happens after executing "SET search_path = ...;"):
1. SET search_path = tenant1;
2. SET ROLE tenant1;
3. -- execute various tenant1 related sql statements here
4. SET search_path = DEAULT;
5. RESET ROLE;
Logging of all statements was again enabled in production for around 6 minutes. And these were the results:
"SET search_path = ...;" -> 7.444 ms (!)
"SET ROLE ...;" -> 0.141 ms
"SET search_path = DEAULT;" -> 0.036 ms
"RESET ROLE;" -> 0.025 ms
And guess what? Now "SET search_path = ...;" takes more than 7 ms on average is more than 50 times slower than "SET ROLE ...;"! 7.444 vs. 0.141 ms.
I think I have found something here. It looks like that the order of statements is affecting their duration. I somehow have the feeling that the first statement after "RESET ROLE;" experiences a performance degradation.
When I use the psql cli on the same database I can see via "\timing" that the first statement after "RESET ROLE;" is significantly slower. I was even able to strip it down to two statements ("SET ROLE ...;" and "RESET ROLE;"):
mydb=> set role tenant1;
SET
Time: 0.516 ms
mydb=> reset role;
RESET
Time: 0.483 ms
mydb=> set role tenant1; <-- first statement after "reset role;"
SET
Time: 10.177 ms <-- significantly slower
mydb=> reset role;
RESET
Time: 0.523 ms
mydb=> set role tenant1; <-- first statement after "reset role;"
SET
Time: 12.119 ms <-- significantly slower
mydb=> reset role;
RESET
Time: 0.462 ms
mydb=> set role tenant1; <-- first statement after "reset role;"
SET
Time: 19.533 ms <-- significantly slower
mydb=>
Maybe my observations here are already sufficient to find out what happens here? I guess that my setup with 1k rows in pg_roles and 1.5m rows in pg_class is probably the cause.
Does it help when I create a test setup with a docker image that contains a database with that many entries in pg_roles and pg_class and share it here?
Regards,
Ulf