Did you try to add an index on filesystem_id
From: "Woolcock, Sean" <Sean.Woolcock@xxxxxxx>
To: "pgsql-performance@xxxxxxxxxxxxxx" <pgsql-performance@xxxxxxxxxxxxxx>
Sent: Monday, October 29, 2012 6:41 PM
Subject: Request for help with slow query
Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.
A description of what you are trying to achieve and what results you expect:
I have a large (3 million row) table called "tape" that represents files,
which I join to a small (100 row) table called "filesystem" that represents
filesystems. I have a web interface that allows you to sort by a number of
fields in the tape table and view the results 100 at a time (using LIMIT
and OFFSET).
The data only changes hourly and I do a "vacuum analyze" after all changes.
The tables are defined as:
create table filesystem (
id serial primary key,
host varchar(256),
storage_path varchar(2048) not null check (storage_path != ''),
mounted_on varchar(2048) not null check (mounted_on != ''),
constraint unique_fs unique(host, storage_path)
);
create table tape (
id serial primary key,
volser char(255) not null check (volser != ''),
path varchar(2048) not null check (path != ''),
scratched boolean not null default FALSE,
last_write_date timestamp not null default current_timestamp,
last_access_date timestamp not null default current_timestamp,
filesystem_id integer references filesystem not null,
size bigint not null check (size >= 0),
worm_status char,
encryption char,
job_name char(8),
job_step char(8),
dsname char(17),
recfm char(3),
block_size int,
lrecl int,
constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
);
An example query that's running slowly for me is:
select tape.volser,
tape.path,
tape.scratched,
tape.size,
extract(epoch from tape.last_write_date) as last_write_date,
extract(epoch from tape.last_access_date) as last_access_date
from tape
inner join filesystem
on (tape.filesystem_id = filesystem.id)
order by last_write_date desc
limit 100
offset 100;
On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.
Here's the explain output:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
-> Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1)
Sort Key: date_part('epoch'::text, tape.last_write_date)
-> Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1)
Hash Cond: ("outer".filesystem_id = "inner".id)
-> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1)
-> Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
-> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1)
Total runtime: 66553.643 ms
Here's a depesz link with that output: http://explain.depesz.com/s/AUR
Things I've tried:
1. I added an index on last_write_date with:
create index tape_last_write_date_idx on tape(last_write_date);
and there was no improvement in query time.
2. I bumped:
effective_cache_size to 1/2 system RAM (1GB)
shared_buffers to 1/4 system RAM (512MB)
work_mem to 10MB
and there was no improvement in query time.
3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
using the same hardware and it was about 5 times faster (nice work,
whoever did that!). Unfortunately upgrading is not an option, so this
is more of an anecdote. I would think the query could go much faster
in either environment with some optimization.
The EXACT PostgreSQL version you are running:
PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)
How you installed PostgreSQL:
Standard SuSE SLES 10-SP3 RPMs:
postgresql-devel-8.1.17-0.3
postgresql-pl-8.1.17-0.4
postgresql-libs-8.1.17-0.3
postgresql-8.1.17-0.3
postgresql-server-8.1.17-0.3
postgresql-contrib-8.1.17-0.3
Changes made to the settings in the postgresql.conf file:
Only the memory changes mentioned above.
Operating system and version:
Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux
SLES 10-SP3
What program you're using to connect to PostgreSQL:
Perl DBI
Perl v5.8.8
What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
perl-DBD-Pg 1.43
If you're using a connection pool, load balancer or application server, which one you're using and its version:
None.
Is there anything remotely unusual in the PostgreSQL server logs?
No, they're empty.
CPU manufacturer and model:
Intel Celeron CPU 440 @ 2.00GHz
Amount and size of RAM installed:
2GB RAM
Storage details (important for performance and corruption questions):
Do you use a RAID controller?
No.
How many hard disks are connected to the system and what types are they?
We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
How are your disks arranged for storage?
Postgres lives on the same 100GB ext3 partition as the OS.
Thanks,
Sean
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
From: "Woolcock, Sean" <Sean.Woolcock@xxxxxxx>
To: "pgsql-performance@xxxxxxxxxxxxxx" <pgsql-performance@xxxxxxxxxxxxxx>
Sent: Monday, October 29, 2012 6:41 PM
Subject: Request for help with slow query
Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.
A description of what you are trying to achieve and what results you expect:
I have a large (3 million row) table called "tape" that represents files,
which I join to a small (100 row) table called "filesystem" that represents
filesystems. I have a web interface that allows you to sort by a number of
fields in the tape table and view the results 100 at a time (using LIMIT
and OFFSET).
The data only changes hourly and I do a "vacuum analyze" after all changes.
The tables are defined as:
create table filesystem (
id serial primary key,
host varchar(256),
storage_path varchar(2048) not null check (storage_path != ''),
mounted_on varchar(2048) not null check (mounted_on != ''),
constraint unique_fs unique(host, storage_path)
);
create table tape (
id serial primary key,
volser char(255) not null check (volser != ''),
path varchar(2048) not null check (path != ''),
scratched boolean not null default FALSE,
last_write_date timestamp not null default current_timestamp,
last_access_date timestamp not null default current_timestamp,
filesystem_id integer references filesystem not null,
size bigint not null check (size >= 0),
worm_status char,
encryption char,
job_name char(8),
job_step char(8),
dsname char(17),
recfm char(3),
block_size int,
lrecl int,
constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
);
An example query that's running slowly for me is:
select tape.volser,
tape.path,
tape.scratched,
tape.size,
extract(epoch from tape.last_write_date) as last_write_date,
extract(epoch from tape.last_access_date) as last_access_date
from tape
inner join filesystem
on (tape.filesystem_id = filesystem.id)
order by last_write_date desc
limit 100
offset 100;
On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.
Here's the explain output:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
-> Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1)
Sort Key: date_part('epoch'::text, tape.last_write_date)
-> Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1)
Hash Cond: ("outer".filesystem_id = "inner".id)
-> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1)
-> Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
-> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1)
Total runtime: 66553.643 ms
Here's a depesz link with that output: http://explain.depesz.com/s/AUR
Things I've tried:
1. I added an index on last_write_date with:
create index tape_last_write_date_idx on tape(last_write_date);
and there was no improvement in query time.
2. I bumped:
effective_cache_size to 1/2 system RAM (1GB)
shared_buffers to 1/4 system RAM (512MB)
work_mem to 10MB
and there was no improvement in query time.
3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
using the same hardware and it was about 5 times faster (nice work,
whoever did that!). Unfortunately upgrading is not an option, so this
is more of an anecdote. I would think the query could go much faster
in either environment with some optimization.
The EXACT PostgreSQL version you are running:
PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)
How you installed PostgreSQL:
Standard SuSE SLES 10-SP3 RPMs:
postgresql-devel-8.1.17-0.3
postgresql-pl-8.1.17-0.4
postgresql-libs-8.1.17-0.3
postgresql-8.1.17-0.3
postgresql-server-8.1.17-0.3
postgresql-contrib-8.1.17-0.3
Changes made to the settings in the postgresql.conf file:
Only the memory changes mentioned above.
Operating system and version:
Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux
SLES 10-SP3
What program you're using to connect to PostgreSQL:
Perl DBI
Perl v5.8.8
What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
perl-DBD-Pg 1.43
If you're using a connection pool, load balancer or application server, which one you're using and its version:
None.
Is there anything remotely unusual in the PostgreSQL server logs?
No, they're empty.
CPU manufacturer and model:
Intel Celeron CPU 440 @ 2.00GHz
Amount and size of RAM installed:
2GB RAM
Storage details (important for performance and corruption questions):
Do you use a RAID controller?
No.
How many hard disks are connected to the system and what types are they?
We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
How are your disks arranged for storage?
Postgres lives on the same 100GB ext3 partition as the OS.
Thanks,
Sean
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance