Re: Slow performance when querying millions of rows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Dne 28.6.2011 23:28, Craig McIlwee napsal(a):
> > Daily table explain analyze: http://explain.depesz.com/s/iLY
> > Half month table explain analyze: http://explain.depesz.com/s/Unt
>
> Are you sure those two queries are exactly the same? Because the daily
> case output says the width is 50B, while the half-month case says it's
> 75B. This might be why the sort/aggregate steps are switched, and that
> increases the amount of data so that it has to be sorted on disk (which
> is why the half-month is so much slower).
>
> Haven't you added some columns to the half-month query?

The daily tables were created using CREATE TABLE AS from the half month tables, structure is the same with the exception of fill factor.  Queries are identical except for the name of the master table that they select from.

>
> > Postgres version:
> > PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit
> >
> > History:
> > None, this is a new database and application
> >
> > Hardware:
> > - 2 Intel Xeon 2.13GHz processors with 8 cores each
> > - 8GB RAM
> > - Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID
> > Controller 512MB Cache
> > - 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of
> > about 2TB
> > - Windows Server 2008 R2 64-bit (but 32-bit postgres)
> > - Hardware upgrades arent an option at this point due to budget and time
> > constraints
>
> Not much experience with PostgreSQL on Windows, but this looks good to
> me. Not sure if RAID5 is a good choice, especially because of write
> performance - this is probably one of the reasons why the disk sort is
> so slow (in the half-month case).

Yes, the data import is painfully slow but I hope to make up for that with the read performance later.

>
> And it's nice you have 8 cores, but don't forget each query executes on
> a single background process, i.e. it may use single core. So the above
> query can't use 8 cores - that's why the in-memory sort takes so long I
> guess.
>
> > Maintenance Setup:
> > Autovacuum is disabled for these tables since the data is never
> > updated.  The tables that we are testing with at the moment will not
> > grow any larger and have been both clustered and analyzed.  They were
> > clustered on the vds_detector_data_timestamp_idx index.
> >
> > GUC Settings:
> > effective_cache_size: 2048MB
> > work_mem: 512MB
> > shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
> > plan and took the same amount of time to execute give or take a few
> seconds
> >
> > Summary:
> >
> > The time to get the raw data (before aggregation and sorting) is
> > relatively similar between the daily and half month tables.  It would
> > appear that the major difference is the ordering of sort and
> > aggregation, the daily tables aggregate first so the amount of data
> > sorted is significantly less.
>
> Yes, the ordering is the problem. The amount of data to sort is so huge
> (3GB) it does not fit into work_mem and has to be sorted on disk. Not
> sure why this happens, the only difference I've noticed is the 'width'
> (50B vs. 75B). Are those two queries exactly the same?
>
> > Since the daily tables are only 360MB, I would hope that the entire
> > table could be pulled into memory with one large sequential read.  Of
> > course this assumes that the file pieces are stored contiguously, but
> > auto defrag is enabled and shows low fragmentation so Im trusting (as
> > much as one can) Windows to do the right thing here.  My drives have a
> > 150MB/s sustained max throughput, and considering that data is spread
> > across 5 drives I would hope to at least be able to reach the single
> > disk theoretical limit and read an entire table plus the index into
> > memory about 4 to 5 seconds.  Based on the analyze output, each daily
> > table averages 6 to 7 seconds, so Im pretty close there and maybe just
> > limited by disk speed?
>
> Well, you have 30 partitions and 7 seconds for each means 210 secons in
> total. Which is about the time you get (before the aggregate/sort).
>
> You have to check where the bottleneck is - is it the I/O or CPU? I'd
> guess the CPU, but I may be wrong. On unix I'd use something like
> iostat/vmstat/top to see what's going on - not sure what to use on
> Windows. I guess there is a some console or maybe Process Explorer from
> sysinternals.
>
> > In both cases, the row estimates vs actual are way off.  Ive increased
> > statistics on the reading_timestamp and road_name columns to 100 and
> > then 1000 with no change.  I ran an ANALYZE after each statistics
> > change.  Should I be upping stats on the non-indexed columns as well?
> > Ive read documentation that says I should be able to set statistics
> > values for an entire table as opposed to per column, but havent found
> > how to do that.  I guess I was either too lazy to update statistics on
> > each column or just didnt think it would help much.
>
> The estimates seem pretty good to me - 10x difference is not that much.
> Could be better, but I don't think you can get a better plan, is seems
> very reasonable to me.
>
> > So, any pointers for performance improvement?
>
> Three ideas what might help
>
> 1) partial indexes
>
> How much do the parameters in the query change? If there are parameters
> that are always the same, you may try to create partial indexes. For
> example if the 'vdd.volume' always has to be '0', then you can create
> the index like this
>
>  CREATE INDEX vds_detector_data_dir_idx
>    ON vds_detector_data
>    USING btree
>    (road_dir)
>    WHERE (vdd.volume!=0);
>
> That way only the rows with 'vdd.volume!=0' will be included in the
> index, the index will be smaller and the bitmap will be created faster.
> Similarly for the other conditions. The smaller the index will be, the
> faster the bitmap creation.
>
> If all the conditions may change, or if the index size does not change
> much, you can't do this.

The 0 volume is the only thing that will always be present, but those records do account for 10 to 15% of the data.  I'll give this a shot, I'm really interested in seeing what impact this had.  For some reason I was under the impression that partial indexes were used for text searches, so I completely overlooked this.

>
> 2) prebuilt results
>
> Another option is precomputation of the 'per partition results' - if you
> know what the conditions are going to be, you can precompute them and
> then query just those (much smaller) tables. But this is very
> application specific.
>
> Again, if the all the conditions change, you can't do this.

This has been one of the toughest issues.  Due to the filtering capabilities, it's just not possible to precalculate anything.

>
> 3) distribute the queries
>
> As I've mentioned, PostgreSQL does not distribute the queries on
> multiple CPU cores, but you can do that on your own at the application
> level.
>
> For example I see the GROUP BY clause contains 'date, hour, min' so you
> can compute the results for each partition separately (in a different
> thread, using a separate connection) and then 'append' them.
>
> Yes, you'll need to keep some metadata to do this efficiently (e.g. list
> of partitions along with from/to timestamps), but you should do this
> anyway I guess (at least I do that when partitioning tables, it makes
> the management much easier).

I noticed this too after a little more testing, there are some serious performance gains to be had here.  I started with a single day query and it took about 15 seconds.  Next was 5 simultaneous queries all at about 30 seconds each and then 10 queries at 50 seconds each.

>
> Not sure if you can do this with the other queries :-(

Not all, but many.  The query in question is the most beastly of them all, so I'm pretty happy to have some strategy for improvement.

>
> regards
> Tomas
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Thanks for the help.
Craig

This e-mail communication (including any attachments) may contain confidential and/or privileged material intended solely for the individual or entity to which it is addressed.
P - Think before you print.

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux