Re: Performance question 83 GB Table 150 million rows, distinct select

"Tomas Vondra" <tv@xxxxxxxx> · Thu, 17 Nov 2011 17:05:31 +0100

On 17 Listopad 2011, 15:17, Aidan Van Dyk wrote:
> With a single query, the query can only run as fast as the single
> stream of requests can be satisfied.  And as the next read is issued
> as soon as the previous is done (the kernel readahead/buffering the
> seq scan helps here), your iostat is going to show 100% util, because
> the there is always the next read "in progress", even if the average
> queue size is low (1).  If you had a 24 spindle array, you could add
> another 20 queries, and you could see the queue size go up, but the
> util would still only be 100%, latency would stay about the same, even
> though your throughput could be 20 times greater.

This is probably the reason why interpreting iostat results is tricky.
It's quite straightforward with a single drive, but once you get to arrays
it's much more difficult. The %util remains 100% but it may actually mean
the I/O is not saturated because some of the drives just sit there doing
nothing. hat's why it's important to look at await and svctime - when
"await >> svctime" it's another sign of saturation.

Tory, you've mentioned you're on a 8-drive RAID5, but all the iostat
results you've posted are about "sda". I'm kinda used "sda" is a regular
drive, not an array - are you sure it's the right device? Are you using a
controller or a sw-raid? With a sw-based RAID you can easily see stats for
each of the drives (so it's easier to see what's going on in the array).

> So, as long as you have a single query scanning that entire 83GB
> table, and that table has to come from disk (i.e. not cached kernel
> buffers in ram), you're going to be limited by the amount of time it
> takes to read that table in 8K chunks.

I don't think he's doing that - the explain plan he posted earlier showed
a bitmap index scan, and as the table is a "log" of actions (just growing
and ordered by log_time) this is pretty-much the best available plan. So
it reads only the interesting part of the table (not the whole 84GB) in a
sequential way.

> Options for improving it are:
>
> 1) Multicolumn index (depending on insert/update/delete patterns)

I don't think this is going to help him (unless the query is much more
complicated than he presented here). This might be interesting with
index-only scans and index on (log_time, uid) but that's in 9.2dev.

> 2) partition by date (depending on query types)

No, this is not going to help him much as he's already scanning only the
interesting part of the table (thanks to the bitmap index scan). It might
eliminate the first step (reading the index and preparing the bitmap), but
that's not the dominant part of the execution time (it takes about 8s and
the whole query takes 127s).

> 3) rollup views of history (depending on query types)

This is probably the most promising path - prebuild some intermediate
results for a day, aggregate just the intermediate results later. I've
already described some ideas in my previous posts.

> 4) trigger based mat-view style rollups (depending on
> insert/update/delete patterns coupled with query types)

Not sure if this can work with 'count(distinct)' - you'd have to keep all
the distinct values (leading to 3) or maintain the counters for every
interval you're interested in (day, hour, ...). Anyway this is going to be
much more complicated than (3), I wouldn't use it unless I really want
continuously updated stats.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance