On Mon, Sep 23, 2019 at 8:57 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Maybe check for waiting on a lock?
It'd be useful to look in pg_stat_activity and/or top(1) while the
initial query is running, to see if it seems to be eating CPU or
is blocked on some condition.
I think this will provide information that makes it seem less weird.
With your suggestion, I monitored postgres via top and pg_stat_activity in the various scenarios and found that an autovacuum was triggering which covered tables used in the insert into select. What seems to be happening is the autovacuum takes just about the same time as I give the query to run before giving up on it. The next time I run the query, the autovacuum is complete and the query runs normally.
Of course, I'd like to understand why the query never finishes.
when autovacuum is running:
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
38 1 postgres R 158m 8% 1 50% postgres: revwaste portal-local 172.19.0.5(39956) SELECT36 1 postgres S 171m 9% 0 0% postgres: autovacuum worker process portal-local
34 1 postgres S 186m 9% 0 0% postgres: autovacuum worker process portal-local
after autovacuum finishes and during remaining soft-lock:
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
45 1 postgres R 259m 13% 0 50% postgres: revwaste portal-local 172.19.0.5(39962) SELECT
20 1 postgres S 153m 8% 1 0% postgres: writer process
22 1 postgres S 153m 8% 0 0% postgres: autovacuum launcher process
45 1 postgres R 259m 13% 0 50% postgres: revwaste portal-local 172.19.0.5(39962) SELECT
20 1 postgres S 153m 8% 1 0% postgres: writer process
22 1 postgres S 153m 8% 0 0% postgres: autovacuum launcher process
Same for pg_stat_activity:
| backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query
-------+--------------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+------------------------------
| 2019-09-23 20:29:45.127527+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.128861+00 | | | active | 808 | 808 | select import_wss()
| 2019-09-23 20:30:01.624853+00 | 2019-09-23 20:30:58.047317+00 | 2019-09-23 20:30:58.047317+00 | 2019-09-23 20:30:58.047318+00 | | | active | | 808 | autovacuum: ANALYZE wss.RowCoding
| 2019-09-23 20:31:01.644824+00 | 2019-09-23 20:31:01.666711+00 | 2019-09-23 20:31:01.666711+00 | 2019-09-23 20:31:01.666712+00 | | | active | | 808 | autovacuum: ANALYZE wss.WSSData
| 2019-09-23 20:31:14.101808+00 | 2019-09-23 20:31:14.103306+00 | 2019-09-23 20:31:14.103306+00 | 2019-09-23 20:31:14.103307+00 | | | active | | 808 | select * from pg_stat_activity
(4 rows)
| backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query
-------+--------------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
| 2019-09-23 20:29:45.127527+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.128861+00 | | | active | 808 | 808 | select import_wss()
| 2019-09-23 20:34:21.01283+00 | 2019-09-23 20:34:21.014473+00 | 2019-09-23 20:34:21.014473+00 | 2019-09-23 20:34:21.014475+00 | | | active | | 808 | select * from pg_stat_activity
(2 rows)
-------+--------------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+------------------------------
| 2019-09-23 20:29:45.127527+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.128861+00 | | | active | 808 | 808 | select import_wss()
| 2019-09-23 20:30:01.624853+00 | 2019-09-23 20:30:58.047317+00 | 2019-09-23 20:30:58.047317+00 | 2019-09-23 20:30:58.047318+00 | | | active | | 808 | autovacuum: ANALYZE wss.RowCoding
| 2019-09-23 20:31:01.644824+00 | 2019-09-23 20:31:01.666711+00 | 2019-09-23 20:31:01.666711+00 | 2019-09-23 20:31:01.666712+00 | | | active | | 808 | autovacuum: ANALYZE wss.WSSData
| 2019-09-23 20:31:14.101808+00 | 2019-09-23 20:31:14.103306+00 | 2019-09-23 20:31:14.103306+00 | 2019-09-23 20:31:14.103307+00 | | | active | | 808 | select * from pg_stat_activity
(4 rows)
| backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query
-------+--------------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
| 2019-09-23 20:29:45.127527+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.128861+00 | | | active | 808 | 808 | select import_wss()
| 2019-09-23 20:34:21.01283+00 | 2019-09-23 20:34:21.014473+00 | 2019-09-23 20:34:21.014473+00 | 2019-09-23 20:34:21.014475+00 | | | active | | 808 | select * from pg_stat_activity
(2 rows)
Can you create a self-contained test case that acts like this?
I can try to duplicate it if this doesn't provide the details needed.
corey