Re: Slow query due to slow I/O

Bryce Covert <bryce@xxxxxxxxxxxxxxxxxxxxxxxxx> · Thu, 12 Dec 2013 13:49:37 -0800

It's strange that it isn't sequential at least for the electric usage, 
as i've clustered using the index that it's using..

I had work_mem set to 128mb. I tried bumping it to 1024mb, and I don't 
think I see a in the query plan. 

Would you think upgrading to 9.2 would help much here? Using a covered 
index, I imagine reads would be limited quite a bit.

Thanks again,

Bryce

   	Claudio Freire

  December 12, 2013
 1:35 PM

On Thu, Dec 12, 2013 at
 6:16 PM, Bryce Covert <bryce@xxxxxxxxxxxxxxxxxxxxxxxxx>
 wrote:

Thanks a lot for the help. I'm not familiar with explain buffers, but 
here's the results:

 Limit  (cost=0.00..648.71 rows=50 width=8) (actual 
time=653.681..52328.707 rows=50 loops=1)

   Buffers: shared hit=7875 read=9870

   ->  GroupAggregate  (cost=0.00..55672.36 rows=4291 width=8) 
(actual time=653.671..52328.480 rows=50 loops=1)

         Filter: ((sum(electricusage.usage) >= 3284::numeric) AND 
(sum(electricusage.usage) <= 3769::numeric))

         Buffers: shared hit=7875 read=9870

         ->  Nested Loop  (cost=0.00..55262.93 rows=34506 width=8) 
(actual time=432.129..52200.465 rows=30711 loops=1)

               Buffers: shared hit=7875 read=9870

               ->  Index Scan using 
premiseaccount_bucket_58c70392619aa36f on premiseaccount premiseaccount 
 (cost=0.00..15433.71 rows=4291 width=4) (actual time=338.160..10014.780
 rows=3437 loops=1)

                     Index Cond: ((bucket)::text = 
'85349_single-family'::text)

                     Buffers: shared hit=744 read=2692

               ->  Index Scan using electricusage_premise_account_id 
on electricusage electricusage  (cost=0.00..9.17 rows=9 width=8) (actual
 time=11.430..12.235 rows=9 loops=3437)

                     Index Cond: (premise_account_id = 
premiseaccount.id)

                     Filter: (from_date >= '2012-11-20 
00:00:00+00'::timestamp with time zone)

                     Buffers: shared hit=7131 read=7178

 Total runtime: 52329.028 ms

(15 rows)

and the iostat results...

Linux 3.11.6-x86_64-linode35 (preview-aps-new)     12/12/2013     
_x86_64_    (8 CPU)

... 

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

xvda              0.10    47.10  152.40    7.80     2.20     0.21    
30.92     1.36    8.50    8.03   17.53   6.14  98.33

xvdb              0.00     0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00    0.00    0.00   0.00   0.00

This means it's doing random I/O, and that your 
disk is a single 7200RPM drive (152 r/s being typical for that 
hardware).

You can improve this by either:

1 - Turning that random I/O pattern into 
sequential, or
2 - Getting better I/O.

I'll assume 2 isn't available to you 
on linode, so for 1, you could try lowering effective_cache_size 
substantially. It seems you're not getting nearly as much caching as you
 think (ie 2GB). However, I doubt there's a plan that can get you 
significantly better performance given your hardware.

You may shave a few seconds, though, 
if you increase work_mem. It seems it should have used a bitmap index 
scan for at least one of the index scans there, and a low work_mem could
 be what's limiting the planner's possibilities. What are your settings 
in that area?

   	Bryce Covert

  December 12, 2013
 1:16 PM

Hey Claudio,

Thanks a lot for the help. I'm not familiar with explain buffers, but 
here's the results:

 Limit  (cost=0.00..648.71 rows=50 width=8) (actual 
time=653.681..52328.707 rows=50 loops=1)

   Buffers: shared hit=7875 read=9870

   ->  GroupAggregate  (cost=0.00..55672.36 rows=4291 width=8) 
(actual time=653.671..52328.480 rows=50 loops=1)

         Filter: ((sum(electricusage.usage) >= 3284::numeric) AND 
(sum(electricusage.usage) <= 3769::numeric))

         Buffers: shared hit=7875 read=9870

         ->  Nested Loop  (cost=0.00..55262.93 rows=34506 width=8) 
(actual time=432.129..52200.465 rows=30711 loops=1)

               Buffers: shared hit=7875 read=9870

               ->  Index Scan using 
premiseaccount_bucket_58c70392619aa36f on premiseaccount premiseaccount 
 (cost=0.00..15433.71 rows=4291 width=4) (actual time=338.160..10014.780
 rows=3437 loops=1)

                     Index Cond: ((bucket)::text = 
'85349_single-family'::text)

                     Buffers: shared hit=744 read=2692

               ->  Index Scan using electricusage_premise_account_id 
on electricusage electricusage  (cost=0.00..9.17 rows=9 width=8) (actual
 time=11.430..12.235 rows=9 loops=3437)

                     Index Cond: (premise_account_id = 
premiseaccount.id)

                     Filter: (from_date >= '2012-11-20 
00:00:00+00'::timestamp with time zone)

                     Buffers: shared hit=7131 read=7178

 Total runtime: 52329.028 ms

(15 rows)

and the iostat results...

Linux 3.11.6-x86_64-linode35 (preview-aps-new)     12/12/2013     
_x86_64_    (8 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

xvda              6.94    65.68   12.16    7.64     0.36     0.29    
67.06     0.44   22.11    3.56   51.66   1.15   2.28

xvdb              0.00     0.00    0.00    0.00     0.00     0.00    
10.61     0.00    1.82    1.82    0.00   1.82   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

xvda              0.10    47.10  152.40    7.80     2.20     0.21    
30.92     1.36    8.50    8.03   17.53   6.14  98.33

xvdb              0.00     0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

xvda              0.20     2.30  212.10    0.70     3.22     0.01    
31.09     1.47    6.88    6.90    2.86   4.63  98.62

xvdb              0.00     0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

xvda              0.20     8.30  183.20    5.10     2.46     0.05    
27.31     1.68    8.85    6.68   86.78   5.24  98.70

xvdb              0.00     0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

xvda              0.10     0.00  165.70    0.00     2.36     0.00    
29.20     1.46    8.86    8.86    0.00   5.95  98.63

xvdb              0.00     0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00    0.00    0.00   0.00   0.00

I'm not sure how to process this except that 2.5MB/s seems really slow, 
and it looks like it is using postgres' cache quite a bit.

Thanks,

Bryce

   	Claudio Freire

  December 12, 2013
 12:48 PM

Well, did
 you run benchmarks? How many IOPS do you get from the volumes?

Try running "iostat -x -m -d 10" while the slow 
query is running and pasting the results (or a relevant sample of them).

Also, do run "explain (analyze, 
buffers)" instead of plain "explain analyze".

   	Bryce Covert

  December 12, 2013
 12:20 PM

I don't have much info on disks, since this is a virtual server on 
linode. It is running ubuntu 12.04, 8cpus, 4GB ram, 95GB ext3 volume 
(noatime). Hopefully that's helpful. 

Bryce

   	Claudio Freire

  December 12, 2013
 12:15 PM
  On Thu, Dec 12, 2013 at 
5:10 PM, Bryce Covert

For this kind of 
diagnostic, you need to include hardware details.

OS? Disks? 
RAID?