Weird planner issue on a standby

Guillaume Lelarge <guillaume@xxxxxxxxxxxx> · Tue, 11 Oct 2022 17:51:46 +0200

Hello,

My customer has a really weird planner issue on a standby.

First,
 the context. There's a primary, and two standbys, all on a 11.8 
release. (I know this isn't the latest release for the v11 branch.) The 
cluster is 5.5TB. The PostgreSQL configuration is the same on all 
servers. pg_db_role_setting is almost empty, and there's nothing 
specific to planning and replication.

Here is the issue. Queries on both standbys take a lot more time than on the primary. We eventually found that the queries take a lot of time to be planned, not to be executed. For example:

On the primary:
 Planning Time: 1806.574 ms
 Execution Time: 771.888 ms
On any of the standbys:
 Planning Time: 41566.439 ms
 Execution Time: 1125.352 ms

A simple EXPLAIN already shows this difference in duration.

The
 query in itself isn't really that big. 8 tables (without partitions), a
 few predicates. Nothing fancy. Nothing huge to plan. Here it is, a bit 
anonymized:

select ...
from t1
left outer join t2 on ...
left outer join t3 on ...
left outer join t4 on ...
left outer join t5 on ...
left outer join t6 on ...
left outer join t7 on ...
left outer join t8 on ...
where c1='...' and c2='...' 
  and c3>='...' and c4<='...' ;

Join conditions are really simple. There's no function called anywhere.

Plans on the three servers are exactly the same. Same nodes, same statistics, same calculations' results.

Moreover,
 we've looked into what the planner was doing with strace, and we've 
been surprised that it involved massive reading on tables (more than 
130,000 calls to read() on 15 files). We found that most of these files 
are the files for one of the tables on this query. Also, this massive 
reading happens only on standbys, not on the primary.

Our customer took the time to drop and create the standbys yesterday night, but the new standbys still have the issue.

And
 finally, we discovered that promoting a standby brings back the old 
performance. We have the same performance between the primary and the 
promoted standby.

To be honest, I'm lost, I don't know what to do next. But I have some questions :)

*
 The planner seems to read tables to generate the plans and/or select 
the right plan. Why does it do that? I thought it only reads indexes if 
needed, but it is quite clear it reads tables also.
* How can
 the replication have an impact on the planner ? (I'm not sure I'm 
asking the right question... I mean, why do I see a difference in 
behaviour between a standby and an old standby, new autonomous server?)
* Do you have any idea on how to fix this?

Thank you.

Regards.

-- 
Guillaume.