One of our instances has been behaving -- oddly. Most queries are
blazing fast. It appears to just be some of the stat views that are
slow. Queries against the following views are quick: pg_stat_activity, pg_stat_xact_all_tables, pg_stat_xact_sys_tables, pg_stat_xact_user_tables, pg_statio_sys_sequences (no system sequences), pg_stat_xact_user_functions, and pg_stat_replication. When I query one of the other statistics views, the queries take roughly 10 seconds when they used to take milliseconds. Here's an explain of one such query: https://explain.depesz.com/s/IVEu Logs show 'using stale statistics instead of current ones because stats collector is not responding' when I query the stats views. CPU usage is very low. Disk I/O looks okay to me. -bash-4.1$ iostat Linux 2.6.32-573.12.1.el6.x86_64 (hostname) 07/25/2016 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 2.80 0.00 0.15 0.01 0.00 97.04 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.45 4.85 7.45 33142948 50914838 sdb 31.48 1403.57 713.55 9597023154 4878965216 dm-0 0.14 0.58 0.56 3949720 3847592 dm-1 1.00 4.26 6.83 29155482 46727688 Most of the files in $PGDATA/pg_stat_tmp haven't been updated by PostgreSQL since July 7th at 14:16. The one exception is pgss_query_texts.stat. I shouldn't have but I touched $PGDATA/pg_stat_tmp/db* on July 12th at 15:14 (doing so did trigger an autovacuum though). I set track_activities and track_counts to off and reloaded config (pg_ctl reload). The log indicated that it noticed the change. I created a directory /dev/shm/pg_stat_tmp, set track_activities and track_counts on, set stats_temp_directory to '/dev/shm/pg_stat_tmp' and reloaded config. Again, the log indicated that it noticed the change but nothing happened with stats. The new directory is empty. I notified the statistics collector to quit (kill -QUIT). It did and the postmaster restarted it. The new stats directory is still empty. The only updates to the old stats directory is the pgss_query_texts.stat file. I tried getting stack traces and the like with gdb and strace but it wasn't really helpful. Most of what I have tried was the result of help by Jim Nasby on postgresteam.slack.com. He and Lukas Fittl suggested that I post about my problem. There's one thing that we all agree on... It should not be so hard to figure out what is going on with the statistics collector. I tried turning stats off and on again today. Here's what the two pg_stat_tmp directories look like today: so now the directory looks like: -bash-4.1$ ls -alh 9.4/data/pg_stat_tmp/ total 4.3M drwx------ 2 postgres postgres 4.0K Jul 7 14:16 . drwx------ 19 postgres postgres 4.0K Jul 25 11:20 .. -rw------- 1 postgres postgres 3.4K Jul 12 15:14 db_0.stat -rw------- 1 postgres postgres 26K Jul 12 15:14 db_13003.stat -rw------- 1 postgres postgres 13K Jul 12 15:14 db_1.stat -rw------- 1 postgres postgres 140K Jul 12 15:14 db_2473933.stat -rw------- 1 postgres postgres 203K Jul 12 15:14 db_2476698.stat -rw------- 1 postgres postgres 27K Jul 12 15:14 db_2595334.stat -rw------- 1 postgres postgres 28K Jul 12 15:14 db_2631762.stat -rw------- 1 postgres postgres 1.5K Jul 7 14:16 global.stat -rw------- 1 postgres postgres 3.9M Jul 25 13:39 pgss_query_texts.stat -bash-4.1$ ls -alh /dev/shm/pg_stat_tmp/ total 0 drwxr-xr-x 2 postgres postgres 40 Jul 12 17:19 . drwxrwxrwt 3 root root 80 Jul 11 17:31 .. Does anyone have any suggestions on how to: - see what the statistics collector is doing? - tell the postmaster to give the statistics collector the up to date config? - get stats working without restarting the instance? (I am not yet sure when that can happen but it would have to be scheduled [in advance] to occur during an after hours maintenance window. Thanks in advance, Matt --
Matthew Musgrove
Senior Software Engineer Assurant Mortgage Solutions 817.788.4482 mmusgrove@xxxxxxxxxxxxxxxxxx matthew.musgrove@xxxxxxxxxxxx |