[no subject]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I haven't looked at pg's code for creating an index, but seriously
suspect it's doing an extern sort then insert into the index.  Such
operations would have 100% correlation from the index insert point of
view and the "sort" could be in memory or the tape variety (more
efficient i/o pattern).

Summary, # of indexes, index correlation, pg's multi versioning,
shared_buffers and checkpoint_segments are interconnected in weird and
wonderful ways... Seldom have found "simple" solutions to performance
problems.

Marc


> -----Original Message-----
> From: pgsql-performance-owner@xxxxxxxxxxxxxx=20
> [mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] On Behalf Of=20
> Aaron Turner
> Sent: Friday, February 10, 2006 3:17 AM
> To: pgsql-performance@xxxxxxxxxxxxxx
> Subject: [PERFORM] 10+hrs vs 15min because of just one index
>=20
> So I'm trying to figure out how to optimize my PG install=20
> (8.0.3) to get better performance without dropping one of my indexes.
>=20
> Basically, I have a table of 5M records with 3 columns:
>=20
> pri_key (SERIAL)
> data char(48)
> groupid integer
>=20
> there is an additional unique index on the data column.
>=20
> The problem is that when I update the groupid column for all=20
> the records, the query takes over 10hrs (after that I just=20
> canceled the update).  Looking at iostat, top, vmstat shows=20
> I'm horribly disk IO bound (for data not WAL, CPU 85-90%=20
> iowait) and not swapping.
>=20
> Dropping the unique index on data (which isn't used in the=20
> query), running the update and recreating the index  runs in=20
> under 15 min.=20
> Hence it's pretty clear to me that the index is the problem=20
> and there's really nothing worth optimizing in my query.
>=20
> As I understand from #postgresql, doing an UPDATE on one=20
> column causes all indexes for the effected row to have to be=20
> updated due to the way PG replaces the old row with a new one=20
> for updates.  This seems to explain why dropping the unique=20
> index on data solves the performance problem.
>=20
> interesting settings:
> shared_buffers =3D 32768
> maintenance_work_mem =3D 262144
> fsync =3D true
> wal_sync_method =3D open_sync
> wal_buffers =3D 512
> checkpoint_segments =3D 30
> effective_cache_size =3D 10000
> work_mem =3D <default> (1024 i think?)
>=20
> box:
> Linux 2.6.9-11EL (CentOS 4.1)
> 2x Xeon 3.4 HT
> 2GB of RAM (but Apache and other services are running)
> 4 disk raid 10 (74G Raptor) for data
> 4 disk raid 10 (7200rpm) for WAL
>=20
> other then throwing more spindles at the problem, any suggestions?
>=20
> Thanks,
> Aaron
>=20
> --
> Aaron Turner
> http://synfin.net/
>=20
> ---------------------------(end of=20
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>=20


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux