Re: pgstat wait timeout encountered

Guillaume Lelarge <guillaume@xxxxxxxxxxxx> · Tue, 10 Mar 2015 13:52:57 +0100

2015-03-10 13:27 GMT+01:00 Mark Steben <mark.steben@xxxxxxxxxxxxxxxxx>:
explain analyze doesn't reveal much:

explain analyze SELECT * FROM pg_stat_database WHERE datname='prime_production';
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Seq Scan on pg_database d  (cost=0.00..1.11 rows=1 width=68) (actual time=11.201..11.203 rows=1 loops=1)
   Filter: (datname = 'prime_production'::name)
   Rows Removed by Filter: 4
 Total runtime: 11.320 ms
(4 rows)

explain on another 'offending' query also isn't very helpful

explain analyze select * from pg_stat_bgwriter
;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Result  (cost=0.00..0.04 rows=1 width=0) (actual time=10.771..10.771 rows=1 loops=1)
 Total runtime: 10.794 ms

I've performed a pg_stat_reset this morning.  That hasn't been done in awhile.  
We'll see if that makes a difference

That usually doesn't make a difference. "pgstat wait timeout" is the usual message when your disks are way too slow.

On Mon, Mar 9, 2015 at 4:47 PM, John Scalia <jayknowsunix@xxxxxxxxx> wrote:

    Have you tried doing an "explain" on your query. Returning that many
    rows sounds like potential corruption.

    On 3/9/2015 4:39 PM, Mark Steben wrote:

      only 5 rows on this table although huge amount of
        returned tuples on one db:

        email_web_service=# select * from pg_stat_database;

        -[ RECORD 1 ]--+------------------------------

        datid          | 1

        datname        | template1

        numbackends    | 0

        xact_commit    | 192026

        xact_rollback  | 0

        blks_read      | 610450

        blks_hit       | 7723329

        tup_returned   | 33702423

        tup_fetched    | 3880133

        tup_inserted   | 7620

        tup_updated    | 291846

        tup_deleted    | 7620

        conflicts      | 0

        temp_files     | 0

        temp_bytes     | 0

        deadlocks      | 0

        blk_read_time  | 0

        blk_write_time | 0

        stats_reset    | 2013-12-17 11:30:58.890718-05

        -[ RECORD 2 ]--+------------------------------

        datid          | 12865

        datname        | template0

        numbackends    | 0

        xact_commit    | 48676

        xact_rollback  | 0

        blks_read      | 16904

        blks_hit       | 1321868

        tup_returned   | 15783033

        tup_fetched    | 274467

        tup_inserted   | 0

        tup_updated    | 0

        tup_deleted    | 0

        conflicts      | 0

        temp_files     | 0

        temp_bytes     | 0

        deadlocks      | 0

        blk_read_time  | 0

        blk_write_time | 0

        stats_reset    | 2014-03-11 06:56:00.139609-04

        -[ RECORD 3 ]--+------------------------------

        datid          | 12870

        datname        | postgres

        numbackends    | 0

        xact_commit    | 809894

        xact_rollback  | 30

        blks_read      | 959513

        blks_hit       | 59209984

        tup_returned   | 144866984

        tup_fetched    | 37242299

        tup_inserted   | 7726

        tup_updated    | 292637

        tup_deleted    | 7620

        conflicts      | 0

        temp_files     | 0

        temp_bytes     | 0

        deadlocks      | 0

        blk_read_time  | 0

        blk_write_time | 0

        stats_reset    | 2013-10-17 20:13:08.614236-04

        -[ RECORD 4 ]--+------------------------------

        datid          | 1998756

        datname        | prime_production

        numbackends    | 198

        xact_commit    | 1904931675

        xact_rollback  | 8354053

        blks_read      | 4085207044

        blks_hit       | 1384556112952

          tup_returned   | 22953800158363

          tup_fetched    | 2347909884098

          tup_inserted   | 2977265442

          tup_updated    | 252247999

        tup_deleted    | 35834382

        conflicts      | 0

        temp_files     | 25119

        temp_bytes     | 1456041951904

        deadlocks      | 43

        blk_read_time  | 0

        blk_write_time | 0

        stats_reset    | 2015-01-27 11:54:14.737594-05

        -[ RECORD 5 ]--+------------------------------

        datid          | 3297782

        datname        | email_web_service

        numbackends    | 3

        xact_commit    | 46

        xact_rollback  | 0

        blks_read      | 4

        blks_hit       | 2764

        tup_returned   | 566

        tup_fetched    | 336

        tup_inserted   | 8

        tup_updated    | 24

        tup_deleted    | 8

        conflicts      | 0

        temp_files     | 0

        temp_bytes     | 0

        deadlocks      | 0

        blk_read_time  | 0

        blk_write_time | 0

        stats_reset    | 2015-03-09 16:36:12.727615-04

        On Mon, Mar 9, 2015 at 4:21 PM, John
          Scalia <jayknowsunix@xxxxxxxxx>
          wrote:

             Not to sound stupid,
              but two possible issues:

              1) this is a really big table and

              2) you have no index on the field "datname"

              Possible?

              Jay

                  On 3/9/2015 3:50 PM, Mark Steben wrote:

                      Good afternoon,

                      We have been sporadically receiving a warning:  pgstat
                        wait timeout 

                        in our production environment.  This has
                          been always immediately followed by

                        a successful execution of the query.

                        EX:

                        Mar  6
                            00:31:29 ardbc01 postgres[42903]: [3-1] mavmail@10.93.156.42:prime_production:0WARNING: 
                            pgstat wait timeout

                            Mar  6 00:31:29 ardbc01 postgres[42903]:
                            [3-2] mavmail@10.93.156.42:prime_production:0STATEMENT: 
                            SELECT * FROM pg_stat_database WHERE
                            datname='prime_production';

                            Mar  6 00:31:29 ardbc01 postgres[42903]:
                            [4-1] mavmail@10.93.156.42:prime_production:0LOG: 
                            duration: 10137.143 ms  statement: SELECT *
                            FROM pg_stat_database WHERE
                            datname='prime_production';

                        and 

                        mavmail@10.93.156.42:prime_production:0WARNING: 

                            pgstat wait timeout

                            Mar  6 00:31:46 ardbc01 postgres[42903]:
                            [6-2] mavmail@10.93.156.42:prime_production:0STATEMENT: 
                            SELECT SUM(idx_blks_hit) AS hits,
                            SUM(idx_blks_read) AS reads FROM
                            pg_statio_user_indexes

                            Mar  6 00:31:46 ardbc01 postgres[42903]:
                            [7-1] mavmail@10.93.156.42:prime_production:0LOG: 
                            duration: 10144.601 ms  statement: SELECT
                            SUM(idx_blks_hit) AS hits,
                            SUM(idx_blks_read) AS reads FROM
                            pg_statio_user_indexes

                        This is not confined to queries of pg stat
                          infrastructure tables. In another database we
                          can receive the same warning on a vacuum:

                          mavmail@10.93.156.45:mavmail
                            WARNING:  pgstat wait timeout

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [12-2] mavmail@10.93.156.45:mavmail
                            STATEMENT:  VACUUM VERBOSE ANALYZE
                            update_queues;

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [13-1] mavmail@10.93.156.45:mavmail
                            INFO:  vacuuming "public.update_queues"

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [13-2] mavmail@10.93.156.45:mavmail
                            STATEMENT:  VACUUM VERBOSE ANALYZE
                            update_queues;

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [14-1] mavmail@10.93.156.45:mavmail
                            INFO:  index "uq_qid" now contains 0 row
                            versions in 7 pages

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [14-2] mavmail@10.93.156.45:mavmail
                            DETAIL:  0 index row versions were removed.

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [14-3] #0114 index pages have been deleted,
                            4 are currently reusable.

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [14-4] #011CPU 0.00s/0.00u sec elapsed 0.00
                            sec.

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [14-5] mavmail@10.93.156.45:mavmail
                            STATEMENT:  VACUUM VERBOSE ANALYZE
                            update_queues;

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [15-1] mavmail@10.93.156.45:mavmail
                            INFO:  "update_queues": found 0 removable, 0
                            nonremovable row versions in 0 out of 0
                            pages

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [15-2] mavmail@10.93.156.45:mavmail
                            DETAIL:  0 dead row versions cannot be
                            removed yet.

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [15-3] #011There were 0 unused item
                            pointers.

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [15-4] #0110 pages are entirely empty.

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [15-5] #011CPU 0.00s/0.00u sec elapsed 0.00
                            sec.

                            Mar  6 01:16:44 ardbc01 postgres[66365]:
                            [15-6] mavmail@10.93.156.45:mavmail
                            STATEMENT:  VACUUM VERBOSE ANALYZE
                            update_queues;

                        I did check older posts on
                              pgadmin - one insight involved the
                              possible inability of autovacuum to grab
                              the latest stats as the stats collector
                              might have gotten stuck in a loop and
                              could not update the stats file.  That was
                              from an older PSQL version and we don't
                              employ autovacuum in these databases.

                        Any insights appreciated:

                          postgres version: 9.2.5

                          platform:

                                Linux ardbc01 2.6.32-358.18.1.el6.x86_64
                              #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64
                              x86_64 x86_64 GNU/Linux

                        Thank you, 

                            Mark
                                        Steben 

                                   Database Administrator

                                    @utoRevenue | Autobase  

                                      CRM

                                      division of Dominion Dealer
                                      Solutions 

                                    95D
                                      Ashley Ave.

                                      West Springfield, MA 01089

                                    t:
                                      413.327-3045

                                    f:
                                      413.383-9567

                            www.fb.com/DominionDealerSolutions

                                    www.twitter.com/DominionDealer

                                     www.drivedominion.com

        -- 

          Mark Steben

                 Database Administrator

                  @utoRevenue | Autobase 

                    CRM
                    division of Dominion Dealer Solutions 

                  95D Ashley Ave.

                    West Springfield, MA 01089 

                  t: 413.327-3045

                  f: 413.383-9567

          www.fb.com/DominionDealerSolutions

                  www.twitter.com/DominionDealer

                   www.drivedominion.com

-- 
Mark Steben 
 Database Administrator
@utoRevenue | Autobase  
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089 

t: 413.327-3045 
f: 413.383-9567 
www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com

-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com