On 17 Listopad 2011, 15:17, Aidan Van Dyk wrote: > With a single query, the query can only run as fast as the single > stream of requests can be satisfied. And as the next read is issued > as soon as the previous is done (the kernel readahead/buffering the > seq scan helps here), your iostat is going to show 100% util, because > the there is always the next read "in progress", even if the average > queue size is low (1). If you had a 24 spindle array, you could add > another 20 queries, and you could see the queue size go up, but the > util would still only be 100%, latency would stay about the same, even > though your throughput could be 20 times greater. This is probably the reason why interpreting iostat results is tricky. It's quite straightforward with a single drive, but once you get to arrays it's much more difficult. The %util remains 100% but it may actually mean the I/O is not saturated because some of the drives just sit there doing nothing. hat's why it's important to look at await and svctime - when "await >> svctime" it's another sign of saturation. Tory, you've mentioned you're on a 8-drive RAID5, but all the iostat results you've posted are about "sda". I'm kinda used "sda" is a regular drive, not an array - are you sure it's the right device? Are you using a controller or a sw-raid? With a sw-based RAID you can easily see stats for each of the drives (so it's easier to see what's going on in the array). > So, as long as you have a single query scanning that entire 83GB > table, and that table has to come from disk (i.e. not cached kernel > buffers in ram), you're going to be limited by the amount of time it > takes to read that table in 8K chunks. I don't think he's doing that - the explain plan he posted earlier showed a bitmap index scan, and as the table is a "log" of actions (just growing and ordered by log_time) this is pretty-much the best available plan. So it reads only the interesting part of the table (not the whole 84GB) in a sequential way. > Options for improving it are: > > 1) Multicolumn index (depending on insert/update/delete patterns) I don't think this is going to help him (unless the query is much more complicated than he presented here). This might be interesting with index-only scans and index on (log_time, uid) but that's in 9.2dev. > 2) partition by date (depending on query types) No, this is not going to help him much as he's already scanning only the interesting part of the table (thanks to the bitmap index scan). It might eliminate the first step (reading the index and preparing the bitmap), but that's not the dominant part of the execution time (it takes about 8s and the whole query takes 127s). > 3) rollup views of history (depending on query types) This is probably the most promising path - prebuild some intermediate results for a day, aggregate just the intermediate results later. I've already described some ideas in my previous posts. > 4) trigger based mat-view style rollups (depending on > insert/update/delete patterns coupled with query types) Not sure if this can work with 'count(distinct)' - you'd have to keep all the distinct values (leading to 3) or maintain the counters for every interval you're interested in (day, hour, ...). Anyway this is going to be much more complicated than (3), I wouldn't use it unless I really want continuously updated stats. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance