Hello,
I'm DBA for several quite large postgres databases on debian servers. Current version:
PostgreSQL 9.2.6 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
but the problem I'm about to describe was discovered before upgrade, while running v9.2.4.
I have set up a cron-job that sends me daily reports on bloat amount in tables and indices, which I calculate using pg_stats, not pgstattuple, for performance and I/O reasons. If the bloat amount or percentage are big enough, I use pg_repack to get rid of it. At some point I noticed, that some tables keep showing up in the reports with the same amount of bloat, which pg_repack was seemingly unable to remove. Investigation showed that pgstattuple gave very different results than my bloat-finding query. Reason - for some tables there are some columns that never show up in pg_statistics.
I'm going to give an example of a table that has this issue. I have vacuumed it, analyzed it, used "alter column set statistic" specifically on the missing column - no results. I hope someone has encountered this issue before or at least could give me some new ideas, why this is or how to fix it.
Here's the example:
# \d+ d3s_log
Table "public.d3s_log"
Column | Type | Modifiers | Storage | Stats target | Description
---------------+-----------------------------+------------------------------------------------------+----------+--------------+-------------
id | bigint | not null default nextval('d3s_log_id_seq'::regclass) | plain | |
users_id | integer | not null | plain | |
bet_id | integer | not null | plain | |
bet_state | integer | | plain | |
d3s_trans_id | character varying(255) | not null | extended | |
operation | d3s_operation_enum | not null | plain | |
created | timestamp without time zone | not null default now() | plain | |
success | boolean | not null | plain | |
response_time | timestamp without time zone | | plain | |
path | character varying(255) | default NULL::character varying | extended | |
params | text | | extended | |
request | text | | extended | 1000 |
response | text | | extended | |
Indexes:
"d3s_log_pkey" PRIMARY KEY, btree (id)
"d3s_log_bet_id_operation_success" btree (bet_id, operation, success)
"d3s_log_bets_id_index" btree (bet_id)
"d3s_log_created" btree (created)
"d3s_log_users_id_index" btree (users_id)
Has OIDs: no
# SELECT s.staattnum, a.attname FROM pg_statistic s INNER JOIN pg_attribute a ON (a.attrelid,a.attnum)=(s.starelid,s.staattnum) WHERE starelid = 'd3s_log'::regclass;
staattnum | attname
-----------+---------------
1 | id
2 | users_id
3 | bet_id
4 | bet_state
5 | d3s_trans_id
6 | operation
7 | created
8 | success
9 | response_time
10 | path
11 | params
13 | response
(12 rows)
## As can be seen, column 12, "request", is missing.
# select tablename, attname, null_frac, avg_width from pg_stats where tablename = 'd3s_log';
tablename | attname | null_frac | avg_width
-----------+---------------+------------+-----------
d3s_log | id | 0 | 8
d3s_log | users_id | 0 | 4
d3s_log | bet_id | 0 | 4
d3s_log | bet_state | 1 | 4
d3s_log | d3s_trans_id | 0 | 23
d3s_log | operation | 0 | 4
d3s_log | created | 0 | 8
d3s_log | success | 0 | 1
d3s_log | response_time | 0 | 8
d3s_log | path | 0.00303333 | 48
d3s_log | params | 0 | 180
d3s_log | response | 0.00123333 | 456
(12 rows)
## Again, "request" is missing, but there were small changes in avg_width for other columns after analyze, so some statistics calculation seems to be working.
The difference between pgstattuple and calculated bloat is 6.5 GB out of 15 GB total table size, which is approximately the amount of data stored in the extended "request" column.
I also did pg_dump of this table and restored it on another server, running v9.2.4 - no errors, analyze calculates statistics, but "request" is still missing there, too.
I have also checked other tables with the same problem - the common thing seems to be that all the missing columns are extended, type text or varchar, but I'm not 100% sure, haven't checked them all yet. The size of the missing column is not a common thing - on some tables the missing column is almost empty.
Looking forward to your answers and ideas,
Katie
I'm DBA for several quite large postgres databases on debian servers. Current version:
PostgreSQL 9.2.6 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
but the problem I'm about to describe was discovered before upgrade, while running v9.2.4.
I have set up a cron-job that sends me daily reports on bloat amount in tables and indices, which I calculate using pg_stats, not pgstattuple, for performance and I/O reasons. If the bloat amount or percentage are big enough, I use pg_repack to get rid of it. At some point I noticed, that some tables keep showing up in the reports with the same amount of bloat, which pg_repack was seemingly unable to remove. Investigation showed that pgstattuple gave very different results than my bloat-finding query. Reason - for some tables there are some columns that never show up in pg_statistics.
I'm going to give an example of a table that has this issue. I have vacuumed it, analyzed it, used "alter column set statistic" specifically on the missing column - no results. I hope someone has encountered this issue before or at least could give me some new ideas, why this is or how to fix it.
Here's the example:
# \d+ d3s_log
Table "public.d3s_log"
Column | Type | Modifiers | Storage | Stats target | Description
---------------+-----------------------------+------------------------------------------------------+----------+--------------+-------------
id | bigint | not null default nextval('d3s_log_id_seq'::regclass) | plain | |
users_id | integer | not null | plain | |
bet_id | integer | not null | plain | |
bet_state | integer | | plain | |
d3s_trans_id | character varying(255) | not null | extended | |
operation | d3s_operation_enum | not null | plain | |
created | timestamp without time zone | not null default now() | plain | |
success | boolean | not null | plain | |
response_time | timestamp without time zone | | plain | |
path | character varying(255) | default NULL::character varying | extended | |
params | text | | extended | |
request | text | | extended | 1000 |
response | text | | extended | |
Indexes:
"d3s_log_pkey" PRIMARY KEY, btree (id)
"d3s_log_bet_id_operation_success" btree (bet_id, operation, success)
"d3s_log_bets_id_index" btree (bet_id)
"d3s_log_created" btree (created)
"d3s_log_users_id_index" btree (users_id)
Has OIDs: no
# SELECT s.staattnum, a.attname FROM pg_statistic s INNER JOIN pg_attribute a ON (a.attrelid,a.attnum)=(s.starelid,s.staattnum) WHERE starelid = 'd3s_log'::regclass;
staattnum | attname
-----------+---------------
1 | id
2 | users_id
3 | bet_id
4 | bet_state
5 | d3s_trans_id
6 | operation
7 | created
8 | success
9 | response_time
10 | path
11 | params
13 | response
(12 rows)
## As can be seen, column 12, "request", is missing.
# select tablename, attname, null_frac, avg_width from pg_stats where tablename = 'd3s_log';
tablename | attname | null_frac | avg_width
-----------+---------------+------------+-----------
d3s_log | id | 0 | 8
d3s_log | users_id | 0 | 4
d3s_log | bet_id | 0 | 4
d3s_log | bet_state | 1 | 4
d3s_log | d3s_trans_id | 0 | 23
d3s_log | operation | 0 | 4
d3s_log | created | 0 | 8
d3s_log | success | 0 | 1
d3s_log | response_time | 0 | 8
d3s_log | path | 0.00303333 | 48
d3s_log | params | 0 | 180
d3s_log | response | 0.00123333 | 456
(12 rows)
## Again, "request" is missing, but there were small changes in avg_width for other columns after analyze, so some statistics calculation seems to be working.
The difference between pgstattuple and calculated bloat is 6.5 GB out of 15 GB total table size, which is approximately the amount of data stored in the extended "request" column.
I also did pg_dump of this table and restored it on another server, running v9.2.4 - no errors, analyze calculates statistics, but "request" is still missing there, too.
I have also checked other tables with the same problem - the common thing seems to be that all the missing columns are extended, type text or varchar, but I'm not 100% sure, haven't checked them all yet. The size of the missing column is not a common thing - on some tables the missing column is almost empty.
Looking forward to your answers and ideas,
Katie