Re: Request for help with slow query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Did you try to add an index on filesystem_id



From: "Woolcock, Sean" <Sean.Woolcock@xxxxxxx>
To: "pgsql-performance@xxxxxxxxxxxxxx" <pgsql-performance@xxxxxxxxxxxxxx>
Sent: Monday, October 29, 2012 6:41 PM
Subject: Request for help with slow query

Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.

A description of what you are trying to achieve and what results you expect:
    I have a large (3 million row) table called "tape" that represents files,
    which I join to a small (100 row) table called "filesystem" that represents
    filesystems.  I have a web interface that allows you to sort by a number of
    fields in the tape table and view the results 100 at a time (using LIMIT
    and OFFSET).

    The data only changes hourly and I do a "vacuum analyze" after all changes.

    The tables are defined as:

        create table filesystem (
            id          serial primary key,
            host        varchar(256),
            storage_path varchar(2048) not null check (storage_path != ''),
            mounted_on  varchar(2048) not null check (mounted_on != ''),
            constraint  unique_fs unique(host, storage_path)
        );
        create table tape (
            id              serial primary key,
            volser          char(255) not null check (volser != ''),
            path            varchar(2048)  not null check (path != ''),
            scratched        boolean        not null default FALSE,
            last_write_date  timestamp      not null default current_timestamp,
            last_access_date timestamp      not null default current_timestamp,
            filesystem_id    integer references filesystem not null,
            size            bigint not null check (size >= 0),
            worm_status      char,
            encryption      char,
            job_name        char(8),
            job_step        char(8),
            dsname          char(17),
            recfm            char(3),
            block_size      int,
            lrecl            int,
            constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
        );

    An example query that's running slowly for me is:

        select tape.volser,
              tape.path,
              tape.scratched,
              tape.size,
              extract(epoch from tape.last_write_date) as last_write_date,
              extract(epoch from tape.last_access_date) as last_access_date
            from tape
            inner join filesystem
                on (tape.filesystem_id = filesystem.id)
            order by last_write_date desc
            limit 100
            offset 100;

    On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.

    Here's the explain output:
                                                                    QUERY PLAN
        ---------------------------------------------------------------------------------------------------------------------------------------
        Limit  (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
        ->  Sort  (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1)
                Sort Key: date_part('epoch'::text, tape.last_write_date)
                ->  Hash Join  (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1)
                    Hash Cond: ("outer".filesystem_id = "inner".id)
                    ->  Seq Scan on tape  (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1)
                    ->  Hash  (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
                            ->  Seq Scan on filesystem  (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1)
        Total runtime: 66553.643 ms

    Here's a depesz link with that output: http://explain.depesz.com/s/AUR


Things I've tried:

    1. I added an index on last_write_date with:

            create index tape_last_write_date_idx on tape(last_write_date);

      and there was no improvement in query time.

    2. I bumped:
            effective_cache_size to 1/2 system RAM (1GB)
            shared_buffers to 1/4 system RAM (512MB)
            work_mem to 10MB
      and there was no improvement in query time.

    3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
      using the same hardware and it was about 5 times faster (nice work,
      whoever did that!).  Unfortunately upgrading is not an option, so this
      is more of an anecdote. I would think the query could go much faster
      in either environment with some optimization.



The EXACT PostgreSQL version you are running:
    PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)

How you installed PostgreSQL:
    Standard SuSE SLES 10-SP3 RPMs:
        postgresql-devel-8.1.17-0.3
        postgresql-pl-8.1.17-0.4
        postgresql-libs-8.1.17-0.3
        postgresql-8.1.17-0.3
        postgresql-server-8.1.17-0.3
        postgresql-contrib-8.1.17-0.3

Changes made to the settings in the postgresql.conf file:
    Only the memory changes mentioned above.

Operating system and version:
    Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux

    SLES 10-SP3

What program you're using to connect to PostgreSQL:
    Perl DBI
    Perl v5.8.8

What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
    perl-DBD-Pg 1.43

If you're using a connection pool, load balancer or application server, which one you're using and its version:
    None.

Is there anything remotely unusual in the PostgreSQL server logs?
    No, they're empty.


CPU manufacturer and model:
    Intel Celeron CPU 440 @ 2.00GHz

Amount and size of RAM installed:
    2GB RAM

Storage details (important for performance and corruption questions):

    Do you use a RAID controller?
        No.
    How many hard disks are connected to the system and what types are they?
        We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
    How are your disks arranged for storage?
        Postgres lives on the same 100GB ext3 partition as the OS.


Thanks,
Sean


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux