We have a table that we use to store aggregated data, nightly we add a row for the previous day, and update all the rows in the table with the new information. The table is then vacuumed full.
Our setup is: Linux Kernel 2.6.4 SCSI Drives Hardware Raid 10 (4 Drives) Reiserfs PostgreSQL 7.4.5
The table looks like: qualitysmith=# \d admin_report_materialized_view Table "public.admin_report_materialized_view" Column | Type | Modifiers ------------------------------+---------+-------------------------------------------------------------------------------- id | integer | not null default nextval('public.admin_report_materialized_view_id_seq'::text) new_order_count | integer | not null default 0 out_of_area_count | integer | not null default 0 fake_order_count | integer | not null default 0 fake_zip_count | integer | not null default 0 in_area_zip_count | integer | not null default 0 customer_estimates | integer | not null default 0 total_estimate_count | integer | not null default 0 cancelled_appointment_count | integer | not null default 0 gross_estimate_count | integer | not null default 0 total_repair_estimate_count | integer | not null default 0 total_replace_estimate_count | integer | not null default 0 active_contractors_count | integer | not null default 0 estimate_percentage | numeric | not null default 0 trades_id | integer | not null master_day | date | not null Indexes: "admin_report_materialized_view_pkey" primary key, btree (id) "admin_report_materialized_view_unique_day_trades_id" unique, btree (master_day, trades_id)
Yesterday the following popped up in our log files: Feb 22 06:12:34 bigbrother postgres[9832]: [162-1] WARNING: index "admin_report_materialized_view_pkey" contains 18677 row versions, but table contains 18674 row versions Feb 22 06:12:34 bigbrother postgres[9832]: [162-2] HINT: Rebuild the index with REINDEX. Feb 22 06:12:34 bigbrother postgres[9832]: [163-1] WARNING: index "admin_report_materialized_view_unique_day_trades_id" contains 18677 row versions, but table contains 18674 row Feb 22 06:12:34 bigbrother postgres[9832]: [163-2] versions Feb 22 06:12:34 bigbrother postgres[9832]: [163-3] HINT: Rebuild the index with REINDEX.
Reindex then gives: qualitysmith=# reindex index admin_report_materialized_view_pkey; ERROR: could not create unique index DETAIL: Table contains duplicated values.
So digging through the table I found the following rows: qualitysmith=# select tableoid, oid, ctid, xmin, xmax, cmin, cmax, id from admin_report_materialized_view where oid = '104649735'; tableoid | oid | ctid | xmin | xmax | cmin | cmax | id -----------+-----------+---------+-----------+-----------+-----------+-----------+------ 104471713 | 104649735 | (13,42) | 704184382 | 704967810 | 704967810 | 15 | 2284 104471713 | 104649735 | (14,68) | 704184382 | 15 | 15 | 707367279 | 2284
And pg_filedump -i -f -R 13 $PGDATA/base/29539533/104471713 gives (partial):
Item 42 -- Length: 96 Offset: 4160 (0x1040) Flags: USED XID: min (704184382) CMIN|XMAX: 704967810 CMAX|XVAC: 15 Block Id: 470 linp Index: 65 Attributes: 16 Size: 28 infomask: 0x2912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
1040: 3e00f929 82f4042a 0f000000 0000d601 >..)...*........ 1050: 41001000 12291c00 07d43c06 ec080000 A....)....<..... 1060: 00000000 00000000 00000000 00000000 ................ 1070: 00000000 00000000 00000000 00000000 ................ 1080: 00000000 00000000 00000000 00000000 ................ 1090: 08000000 00000000 0a000000 e0060000 ................
pg_filedump -i -f -R 14 $PGDATA/base/29539533/104471713 gives (partial):
Item 68 -- Length: 96 Offset: 1440 (0x05a0) Flags: USED XID: min (704184382) CMIN|XMAX: 15 CMAX|XVAC: 707367279 Block Id: 14 linp Index: 68 Attributes: 16 Size: 28 infomask: 0x2912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
05a0: 3e00f929 0f000000 6f91292a 00000e00 >..)....o.)*.... 05b0: 44001000 12291c00 07d43c06 ec080000 D....)....<..... 05c0: 00000000 00000000 00000000 00000000 ................ 05d0: 00000000 00000000 00000000 00000000 ................ 05e0: 00000000 00000000 00000000 00000000 ................ 05f0: 08000000 00000000 0a000000 e0060000 ................
I am not positive my calculations here are correct. Finding the appropriate place in the CLOG for the XMIN I come up with file 029F byte offset 2400F which is 10. For XMAX (704967810) I come up with file 02A0 offset 13D20 which has value 20. (I don't know what these mean so they are included for anyone who may.)
Ok, so trying to hazard a best guess as to what happened. At this point it seems as though the duplicated row popped up as the result of the vacuum full. Reasoning on that would be that vacuum would be the only thing that would legitimately move tuples around in a table without changing their xmin. In this particular case the vacuum full is run out of crontab at 00:33 via:
/usr/local/bin/vacuumdb -z -q -f -d qualitysmith -t admin_report_materialized_view
Unfortunately I have not found any indication in our log files that the postgres server underwent anything unusual during this timeframe. No machine reboots, no postgres server restarts. For reference:
Feb 22 00:02:59 bigbrother postgres[30550]: [162-1] ERROR: column "job_type_id" of relation "promo_requests" does not exist Feb 22 00:04:13 bigbrother postgres[30759]: [162-1] LOG: recycled transaction log file "0000004400000060" Feb 22 00:04:13 bigbrother postgres[30759]: [163-1] LOG: recycled transaction log file "0000004400000061" Feb 22 00:04:13 bigbrother postgres[30759]: [164-1] LOG: recycled transaction log file "000000440000005F" Feb 22 00:05:42 bigbrother postgres[31051]: [162-1] ERROR: column "job_type_id" of relation "promo_requests" does not exist Feb 22 00:06:50 bigbrother postgres[31240]: [162-1] ERROR: duplicate key violates unique constraint "visitor_referal_code_unique_btree_idx" Feb 22 00:09:20 bigbrother postgres[31670]: [162-1] LOG: recycled transaction log file "0000004400000062" Feb 22 00:09:20 bigbrother postgres[31670]: [163-1] LOG: recycled transaction log file "0000004400000063" Feb 22 00:09:20 bigbrother postgres[31670]: [164-1] LOG: recycled transaction log file "0000004400000064" Feb 22 00:24:41 bigbrother postgres[1837]: [162-1] ERROR: duplicate key violates unique constraint "visitor_referal_code_unique_btree_idx" Feb 22 00:28:45 bigbrother postgres[2520]: [162-1] ERROR: duplicate key violates unique constraint "visitor_referal_code_unique_btree_idx" Feb 22 00:38:19 bigbrother postgres[4224]: [162-1] ERROR: duplicate key violates unique constraint "visitor_referal_code_unique_btree_idx" Feb 22 00:39:37 bigbrother postgres[4429]: [162-1] LOG: recycled transaction log file "0000004400000065" Feb 22 00:52:23 bigbrother postgres[6811]: [162-1] ERROR: duplicate key violates unique constraint "visitor_referal_code_unique_btree_idx"
So now I am back to wondering if this problem originated when I think that it did (between 2005/02/22 00:00:00 and 2005/02/22 01:00:00) or if this could have been hiding under the surface for longer than that.
I am suspecting a hardware/fs driver/kernel bug here, and I am hoping to be able to pinpoint a time in hopes of being able to find any indicator at all that could tell me where this all started.
Thanks for your time.
-- Jared Carr RBS Interactive - 89 Glass - QualitySmith jared@xxxxxxxxxxx
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings