Hi,
Thanks for your help Alban !
Alexis.
2014-08-09 18:46 GMT+02:00 Alban Hertroys <haramrae@xxxxxxxxx>:
On 09 Aug 2014, at 13:24, Alexis Bernard <alexis@xxxxxxxxxx> wrote:What’s the output of: VACUUM VERBOSE tracks;
> Primary key definition: "tracks_pkey" PRIMARY KEY, btree (id)
> select version(): PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-16ubuntu6) 4.8.2, 64-bit
> Ubuntu trusty with kernel 3.13.0-29-generic.
>
> I have a autovacuum process running and configuration is by default: ps aux | grep vacuum | grep -v grep
> postgres 587 0.0 0.0 1174304 3720 ? Ss juil.30 0:05 postgres: autovacuum launcher process
>
> I tried to vacuum manualy.
Anything that looks abnormal?
INFO: vacuuming "public.tracks"
INFO: scanned index "tracks_pkey" to remove 25777 row versions
DETAIL: CPU 0.03s/0.12u sec elapsed 0.87 sec.
INFO: scanned index "index_tracks_on_permalink" to remove 25777 row versions
DETAIL: CPU 0.07s/0.40u sec elapsed 5.30 sec.
INFO: "tracks": removed 25777 row versions in 5374 pages
DETAIL: CPU 0.00s/0.14u sec elapsed 0.66 sec.
INFO: index "tracks_pkey" now contains 1657345 row versions in 4594 pages
DETAIL: 17739 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "index_tracks_on_permalink" now contains 1657345 row versions in 6143 pages
DETAIL: 25715 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "tracks": found 17560 removable, 1657349 nonremovable row versions in 35586 out of 35586 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 64749 unused item pointers.
0 pages are entirely empty.
CPU 0.14s/0.86u sec elapsed 7.30 sec.
INFO: vacuuming "pg_toast.pg_toast_17228"
INFO: index "pg_toast_17228_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_17228": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
What do below queries return? They’re from the manual at: http://www.postgresql.org/docs/9.3/static/routine-vacuuming.html
SELECT c.oid::regclass as table_name,
greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm');
table_name | age
--------------------------------------------+----------
pg_statistic | 37049759
pg_type | 37049759
artists | 37042311
pg_authid | 37049759
pg_attribute | 37049759
pg_proc | 37049759
pg_class | 37049759
pg_user_mapping | 37049759
pg_constraint | 37049759
pg_inherits | 37049759
pg_index | 37049759
pg_operator | 37049759
pg_opfamily | 37049759
pg_opclass | 37049759
pg_am | 37049759
pg_amop | 37049759
pg_amproc | 37049759
pg_language | 37049759
pg_largeobject_metadata | 37049759
pg_database | 37049759
pg_rewrite | 37049759
pg_trigger | 37049759
pg_event_trigger | 37049759
pg_description | 37049759
pg_cast | 37049759
pg_enum | 37049759
pg_namespace | 37049759
pg_conversion | 37049759
pg_depend | 37049759
explorers | 36847481
pg_db_role_setting | 37049759
pg_tablespace | 37049759
pg_pltemplate | 37049759
pg_auth_members | 37049759
pg_shdepend | 37049759
pg_shdescription | 37049759
pg_ts_config | 37049759
pg_ts_config_map | 37049759
pg_ts_dict | 37049759
pg_ts_parser | 37049759
pg_ts_template | 37049759
pg_extension | 37049759
pg_foreign_data_wrapper | 37049759
pg_foreign_server | 37049759
pg_foreign_table | 37049759
pg_default_acl | 37049759
pg_seclabel | 37049759
pg_shseclabel | 37049759
pg_range | 37049759
pg_largeobject | 37049759
information_schema.sql_implementation_info | 37049759
information_schema.sql_languages | 37049759
information_schema.sql_packages | 37049759
information_schema.sql_sizing | 37049759
information_schema.sql_sizing_profiles | 37049759
delayed_jobs | 37042298
pg_attrdef | 37049759
pg_aggregate | 37049759
invitation_requests | 37042286
imports | 37042289
information_schema.sql_features | 37049759
pg_collation | 37049759
information_schema.sql_parts | 37049759
schema_migrations | 37042277
tracks | 37042273
invitations | 37042283
users | 37042270
playlists | 37042280
sources | 37042276
absorbed_tracks | 37042314
countries | 37042302
bridge_errors | 37042308
bridges | 37042305
countries_sources | 37042299
favorites | 37042295
import_entries | 37042292
SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
------------------+----------
template1 | 37049756
template0 | 37049756
postgres | 37049756
disco_production | 37049756
I’m wondering if perhaps autovacuum doesn’t run to completion, the above should determine that.
To get out of this pickle, you might want to try VACUUM FULL, but that puts an exclusive lock on the table and creates a new copy of it while it’s running, so you better have enough disk space to contain that table twice. It does rewrite the table though, which is what I hope fixes your issue.
The normal VACUUM should have updated the frozenid’s appropriately though, a VACUUM FULL should not actually be necessary.
The problem is probably not in the index then, or it wouldn’t have trouble re-creating it...
> I tried reindex:
>
> => reindex index tracks_pkey;
> ERROR: failed to find parent tuple for heap-only tuple at (38802,116) in table “tracks
Do I read that as disks 0 to 2 are part of an LSI SAS? Are you really using raid-1 then, as if I’m reading that correctly only 1 of the 3 disks is actually attached to the raid array (namely the Dell disk) and the others are "not claimed" (i.e. spares)?
> Here are more information about storage (raid 1):
Not sure, but I understood that the Dell RAID 1 controller contains 2 Seagate disks.
Anyway, from this it looks like you are using reliable storage that properly honours fsync commands. Or is this a hardware raid-controller with no battery backup unit and caching set to write-back instead of write-thru?
No idea. I can ask to my hosting provider if needed.
Oh, one last thing, the convention in this mailing list is to reply inline instead of top-posting.
Got it :-)
> lshw -class disk -class storage
> *-storage
> description: Serial Attached SCSI controller
> produit: SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
> fabriquant: LSI Logic / Symbios Logic
> identifiant matériel: 0
> information bus: pci@0000:01:00.0
> nom logique: scsi0
> version: 03
> bits: 64 bits
> horloge: 33MHz
> fonctionnalités: storage pm pciexpress vpd msi msix bus_master cap_list rom
> configuration: driver=mpt2sas latency=0
> ressources: irq:16 portE/S:2000(taille=256) mémoire:c5140000-c514ffff mémoire:c5100000-c513ffff mémoire:c5400000-c54fffff
> *-disk:0 NON-RÉCLAMÉ
> description: ATA Disk
> produit: ST1000NM0033-9ZM
> fabriquant: Seagate
> identifiant matériel: 0.0.0
> information bus: scsi@0:0.0.0
> version: GA04
> numéro de série: Z1W0HGB9
> configuration: ansiversion=5
> *-disk:1
> description: SCSI Disk
> produit: Virtual Disk
> fabriquant: Dell
> identifiant matériel: 1.0.0
> information bus: scsi@0:1.0.0
> nom logique: /dev/sda
> version: 1028
> taille: 931GiB (999GB)
> capacité: 931GiB (999GB)
> fonctionnalités: 15000rpm partitioned partitioned:dos
> configuration: ansiversion=6 sectorsize=512 signature=000e59a7
> *-disk:2 NON-RÉCLAMÉ
> description: ATA Disk
> produit: ST1000NM0033-9ZM
> fabriquant: Seagate
> identifiant matériel: 0.1.0
> information bus: scsi@0:0.1.0
> version: GA04
> numéro de série: Z1W0HFYZ
> configuration: ansiversion=5
> *-storage
> description: SATA controller
> produit: 6 Series/C200 Series Chipset Family SATA AHCI Controller
> fabriquant: Intel Corporation
> identifiant matériel: 1f.2
> information bus: pci@0000:00:1f.2
> version: 04
> bits: 32 bits
> horloge: 66MHz
> fonctionnalités: storage msi pm ahci_1.0 bus_master cap_list
> configuration: driver=ahci latency=0
> ressources: irq:43 portE/S:3048(taille=8) portE/S:3054(taille=4) portE/S:3040(taille=8) portE/S:3050(taille=4) portE/S:3020(taille=32) mémoire:c5204000-c52047ff
>
> Cheers,
> Alexis.
>
>
> 2014-08-09 12:35 GMT+02:00 Alban Hertroys <haramrae@xxxxxxxxx>:
> On 09 Aug 2014, at 11:38, Alexis Bernard <alexis@xxxxxxxxxx> wrote:
>
> > Hi,
> >
> > I am having regurlarly duplicated rows with the same primary key.
> >
> > => select id, created_at, updated_at from tracks where created_at = '2014-07-03 15:09:16.336488';
> > id | created_at | updated_at
> > --------+----------------------------+----------------------------
> > 331804 | 2014-07-03 15:09:16.336488 | 2014-07-03 15:37:55.253729
> > 331804 | 2014-07-03 15:09:16.336488 | 2014-07-03 15:37:55.253801
> >
> > => select id from tracks where id = 331804;
> > id
> > ----
> > (0 rows)
> >
>
> First of all, what is the definition of that primary key?
> What exact version of PG are you using?: select version();
> What exact OS is this on? What kind of storage?
> To me it looks like you may have run into transaction wrap-around or a corrupted index.
>
> Before you do anything, make a backup.
>
> Theorising that the issue here indeed is transaction wrap-around, what you’re seeing may be data from older transactions that has become newer because your current transaction txid is lower (due to the wraparound) than the txid of the transactions those rows belong(ed) to. If those transactions were committed, then you’re possibly seeing deleted or updated rows that are still around. TX wraparound can occur if you do not vacuum frequently enough and another thing that vacuum does is mark old rows obsolete so that the DB can reclaim the space they use. Seeing data from rows that are no longer there or that has been modified since seems to fit the bill here.
>
> Hence the question: When did you last (auto-)vacuum this table? Did you perhaps turn autovacuum off? Did it fall behind?
>
> If you have been vacuuming and the issue is a corrupt index: Does it help to reindex that table? You said that you have fsync on; what kind of storage is this database on? Something allowed that index to get corrupted. It is more likely that it’s caused by something in the underlying storage system (including the OS) than that it is a bug in PG.
>
> > => delete from tracks where created_at = '2014-07-03 15:09:16.336488' and updated_at = '2014-07-03 15:37:55.253801';
> > ERROR: update or delete on table "tracks" violates foreign key constraint "fk_sources_on_track_id" on table "sources"
> > DETAIL: Key (id)=(331804) is still referenced from table "sources”.
>
> Apparently there is a row from another table referencing this one. So either the referenced row does actually exist (corrupt index theory) or it doesn’t and the referencing row is from an older transaction as well (TX wraparound theory).
>
> Considering that you’re seeing this regularly, my bet is on TX wraparound.
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>
>
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.