An additional data point is that this database is a streaming read-only replica. We do not see a corresponding spike in active sessions on the primary (15 active sessions) when the replica active sessions spike. No locks or wait events are observable in the primary when the replica active sessions spike.
Craig
On Mar 31, 2021, 10:26 AM -0600, Hotmail <crajac66@xxxxxxxxxxx>, wrote:
To complete the picture here is our grafana graph on locks. The only locks we see are access share locks.
<Screen Shot 2021-03-31 at 10.23.30 AM.png>
CraigOn Mar 31, 2021, 10:22 AM -0600, Hotmail <crajac66@xxxxxxxxxxx>, wrote:
We actually keep track of the wait events in another grafana graph. We use the following query to generate the graph. (We assume that an active session that has a NULL wait_event is "ON CPU”. Not sure if our assuming an active session with a null wait being on cpu is valid. We sample pg_stat_activity every 30 seconds.
SELECT
coalesce(wait_event, 'ON CPU') AS type,
count(*)
FROM
pg_stat_activity
WHERE
state = 'active'
AND usename != 'repmgr'
GROUP BY
TYPE
ORDER BY
count(*) DESC ";
Here’s a sample from our wait_event graph during times when we see high active sessions but a sleeping postgres OS process. Is it possible we could be hitting an un-instremented wait event?
<Screen Shot 2021-03-31 at 10.18.01 AM.png>
CraigOn Mar 31, 2021, 10:00 AM -0600, Hotmail <crajac66@xxxxxxxxxxx>, wrote:
Unfortunately, the wait_event and wait_event_type columns are all NULL for these active sessions.
CraigOn Mar 31, 2021, 3:47 AM -0600, Laurenz Albe <laurenz.albe@xxxxxxxxxxx>, wrote:
On Tue, 2021-03-30 at 12:22 -0600, Hotmail wrote:
We are trying to gain some insight into a performance bottleneck that we are hitting while load testing Postgres on 11.11.
[hundreds of active sessions, but CPU is not maxed out]
With that many active sessions you are probably hitting some contention inside
the database. Look at "wait_event" and "wait_event_type" in pg_stat_activity.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com