One of servers under my support 2 days ago produced the next error:
ERROR: could not read block 135 in file "base/16404/118881486": read only 0 of 8192 bytes
Server version 9.0.6. No db or server crashes or db recovery happen on that server since setup.
Server is db backend for very large web project under quite a heavy load (10k+ request per second to db).
No other strange errors found in the logs since server was put to production half year ago.
Now what happened (from DB log):
2012-02-17 22:35:58 MSK 14333 [vxid:340/18229999 txid:2341883282] [DELETE] LOG: duration: 5.669 ms execute <unnamed>: delete from agency_statistics
2012-02-17 22:35:58 MSK 14333 [vxid:340/18230000 txid:0] [BIND] LOG: duration: 2.787 ms bind <unnamed>: insert into agency_statistics (employer_id, area_id, area_pa
th, professional_area_id, vacancies_number) select e.employer_id, vb.area_id, a.path, s.professional_area_id, count(distinct v.vacancy_id) from vacancy v inner join employer e on v.employer
_id = e.employer_id inner join vacancy_body vb on v.vacancy_body_id = vb.vacancy_body_id inner join vacancy_body_specialization vs on vb.vacancy_body_id = vs.vacancy_body_id inner join spec
ialization s on s.specialization_id = vs.specialization_id inner join area a on vb.area_id = a.area_id where v.archived_status in (0) and v.disabled = false and e.category = $1 and e.state
= $2 and e.manager_id > 0 group by e.employer_id, vb.area_id, a.path, s.professional_area_id
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230000 txid:2341883336] [INSERT] LOG: duration: 726.700 ms execute <unnamed>: insert into agency_statistics (employer_id,
area_id, area_path, professional_area_id, vacancies_number) select e.employer_id, vb.area_id, a.path, s.professional_area_id, count(distinct v.vacancy_id) from vacancy v inner join employer
e on v.employer_id = e.employer_id inner join vacancy_body vb on v.vacancy_body_id = vb.vacancy_body_id inner join vacancy_body_specialization vs on vb.vacancy_body_id = vs.vacancy_body_id
inner join specialization s on s.specialization_id = vs.specialization_id inner join area a on vb.area_id = a.area_id where v.archived_status in (0) and v.disabled = false and e.category =
$1 and e.state = $2 and e.manager_id > 0 group by e.employer_id, vb.area_id, a.path, s.professional_area_id
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230070 txid:0] [BIND] LOG: duration: 30.195 ms bind S_35: insert into agency_statistics (employer_id, area_id, area_path,
professional_area_id, vacancies_number, rank) values ($1, $2, $3, $4, $5, 0)
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230082 txid:2341883427] [INSERT] ERROR: could not read block 135 in file "base/16404/118881486": read only 0 of 8192 bytes
2012-02-17 22:35:59 MSK 14333 [vxid:340/18230082 txid:2341883427] [INSERT] STATEMENT: insert into agency_statistics (employer_id, area_id, area_path, professional_area_id, vacancies_number, rank) values ($1, $2, $3, $4, $5, 0)
No concurrent writes was happened to that table outside of that connection.
May be autovacuum were fired on that table between delete and first or second inserts (unfortunately it wasn't logged).
I recreated table from scratch and keep the damaged table under another name (through alter table agency_statistics rename to agency_statistics_old).
So I have files to dig into.
What I see in file system:
hh=# SELECT relfilenode from pg_class where relname='agency_statistics_old';
relfilenode
-------------
118881486
postgres@db10:~/tmp$ ls -la /var/lib/postgresql/9.0/main/base/16404/118881486
-rw------- 1 postgres postgres 0 2012-02-20 12:04 /var/lib/postgresql/9.0/main/base/16404/118881486
So table file size zero bytes (seems autovacuum truncated that table to 0 bytes).
But index show:
hh=# SELECT relfilenode from pg_class where relname='agency_statistics_pkey';
relfilenode
-------------
118881489
postgres@db10:~/tmp$ ls -la /var/lib/postgresql/9.0/main/base/16404/118881489
-rw------- 1 postgres postgres 4956160 2012-02-20 12:07 /var/lib/postgresql/9.0/main/base/16404/118881489
hh=# SELECT * from pgstatindex('agency_statistics_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 2
index_size | 4947968
root_block_no | 295
internal_pages | 1
leaf_pages | 601
empty_pages | 0
deleted_pages | 1
avg_leaf_density | 0.45
leaf_fragmentation | 13.14
I out of ideas now.
Any suggestions where and what I should look next?
Kind Regards,
Maksym
--
Maxim Boguk
Senior Postgresql DBA.
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
Skype: maxim.boguk
Jabber: maxim.boguk@xxxxxxxxx
LinkedIn profile: http://nz.linkedin.com/in/maximboguk
"If they can send one man to the moon... why can't they send them all?"
МойКруг: http://mboguk.moikrug.ru/
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."