Re: Request for help with slow query

salah jubeh <s_jubeh@xxxxxxxxx> · Mon, 29 Oct 2012 12:18:19 -0700 (PDT)

Did you try to add an index on filesystem_id 

  From: "Woolcock, Sean" <Sean.Woolcock@xxxxxxx>
 To: "pgsql-performance@xxxxxxxxxxxxxx" <pgsql-performance@xxxxxxxxxxxxxx> 
 Sent: Monday, October 29, 2012 6:41 PM
 Subject:  Request for help with slow query

Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.

A description of what you are trying to achieve and what results you expect:
    I have a large (3 million row) table called "tape" that represents files,
    which I join to a small (100 row) table called "filesystem" that represents
    filesystems.  I have a web interface that allows you to sort by a number of
    fields in the tape table and view the results 100 at a time (using LIMIT
    and OFFSET).

    The data only changes hourly and I do a "vacuum analyze" after all changes.

    The tables are defined as:

        create table filesystem (
            id           serial primary key,
            host   
      varchar(256),
            storage_path varchar(2048) not null check (storage_path != ''),
            mounted_on   varchar(2048) not null check (mounted_on != ''),
            constraint   unique_fs unique(host, storage_path)
        );
        create table tape (
            id               serial primary key,
            volser           char(255) not null check (volser != ''),
            path             varchar(2048)   not null check (path != ''),
            scratched        boolean         not null default
 FALSE,
            last_write_date  timestamp       not null default current_timestamp,
            last_access_date timestamp       not null default current_timestamp,
            filesystem_id    integer references filesystem not null,
            size             bigint not null check (size >= 0),
            worm_status      char,
            encryption       char,
            job_name         char(8),
            job_step         char(8),
            dsname           char(17),

           recfm            char(3),
            block_size       int,
            lrecl            int,
            constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
        );

    An example query that's running slowly for me is:

        select tape.volser,
               tape.path,
               tape.scratched,
               tape.size,
               extract(epoch from tape.last_write_date) as last_write_date,
               extract(epoch from tape.last_access_date) as
 last_access_date
            from tape
            inner join filesystem
                on (tape.filesystem_id = filesystem.id)
            order by last_write_date desc
            limit 100
            offset 100;

    On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.

    Here's the explain output:
                                                                    QUERY PLAN

 ---------------------------------------------------------------------------------------------------------------------------------------
        Limit  (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
        ->  Sort  (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1)
                Sort Key: date_part('epoch'::text, tape.last_write_date)
                ->  Hash Join  (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1)
                    Hash Cond: ("outer".filesystem_id = "inner".id)
                    ->  Seq Scan on
 tape  (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1)
                    ->  Hash  (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
                            ->  Seq Scan on filesystem  (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1)
        Total runtime: 66553.643 ms

    Here's a depesz link with that output: http://explain.depesz.com/s/AUR

Things I've tried:

    1. I added an index on last_write_date with:

            create index tape_last_write_date_idx on tape(last_write_date);

       and there was no improvement in query time.

    2. I
 bumped:
             effective_cache_size to 1/2 system RAM (1GB)
             shared_buffers to 1/4 system RAM (512MB)
             work_mem to 10MB
       and there was no improvement in query time.

    3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
       using the same hardware and it was about 5 times faster (nice work,
       whoever did that!).  Unfortunately upgrading is not an option, so this
       is more of an anecdote. I would think the query could go much faster
       in either environment with some optimization.

The EXACT PostgreSQL version you are running:
    PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)

How you
 installed PostgreSQL:
    Standard SuSE SLES 10-SP3 RPMs:
        postgresql-devel-8.1.17-0.3
        postgresql-pl-8.1.17-0.4
        postgresql-libs-8.1.17-0.3
        postgresql-8.1.17-0.3
        postgresql-server-8.1.17-0.3
        postgresql-contrib-8.1.17-0.3

Changes made to the settings in the postgresql.conf file:
    Only the memory changes mentioned above.

Operating system and version:
    Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux

    SLES 10-SP3

What program you're using to connect to PostgreSQL:
    Perl DBI
    Perl v5.8.8

What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
    perl-DBD-Pg 1.43

If you're using a
 connection pool, load balancer or application server, which one you're using and its version:
    None.

Is there anything remotely unusual in the PostgreSQL server logs?
    No, they're empty.

CPU manufacturer and model:
    Intel Celeron CPU 440 @ 2.00GHz

Amount and size of RAM installed:
    2GB RAM

Storage details (important for performance and corruption questions):

    Do you use a RAID controller?
        No.
    How many hard disks are connected to the system and what types are they?
        We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
    How are your disks arranged for storage?
        Postgres lives on the same 100GB ext3 partition as the OS.

Thanks,
Sean

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance