Re: Re: Squid high bandwidth IO issue (ramdisk SSD)

Adrian Chadd <adrian@xxxxxxxxxxxxxxx> · Tue, 4 Aug 2009 14:57:43 +0800

How much disk IO is going on when the CPU shows 70% IOWAIT? Far too
much. The CPU time spent in CPU IOWAIT shouldn't be that high. I think
you really should consider trying an alternative disk controller.

adrian

2009/8/4 smaugadi <adi@xxxxxxxxxxxx>:
>
> Dear Adrian and Heinz,
> Sorry for the delayed replay and thanks for all the help so far.
> I have tried changing the file system (ext2 and ext3), changed the
> partitioning geometry (fdisk -H 224 -S 56) as I read that this would improve
> performance with SSD.
> I tried ufs, aufs and even coss (downgrade to 2.6). (By the way the average
> object size is 13KB).
> And failed!
>
> From system monitoring during the squid degradation I saw:
>
> /usr/local/bin/iostat -dk -x 1 1000 sdb
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb               0.00     0.00    0.00    4.00     0.00    72.00    36.00
> 155.13 25209.75 250.25 100.10
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb               0.00     0.00    0.00    4.00     0.00    16.00     8.00
> 151.50 26265.50 250.50 100.20
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb               0.00     0.00    0.00    3.00     0.00    12.00     8.00
> 147.49 27211.33 333.33 100.00
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb               0.00     0.00    0.00    4.00     0.00    32.00    16.00
> 144.54 28311.25 250.25 100.10
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb               0.00     0.00    0.00    4.00     0.00   100.00    50.00
> 140.93 29410.25 250.25 100.10
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb               0.00     0.00    0.00    4.00     0.00    36.00    18.00
> 137.00 30411.25 250.25 100.10
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb               0.00     0.00    0.00    2.00     0.00     8.00     8.00
> 133.29 31252.50 500.50 100.10
>
> As soon as the service time increases above 200MS problems start, also the
> total time for service (time in queue + service time) goes all the way to 32
> sec.
>
> This is from mpstat at the same time:
>
> 09:33:56 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
> %idle    intr/s
> 09:33:58 AM  all    3.00    0.00    2.25   84.02    0.12    2.75    0.00
> 7.87   9782.00
> 09:33:58 AM    0    3.98    0.00    2.99   72.64    0.00    3.98    0.00
> 16.42   3971.00
> 09:33:58 AM    1    2.01    0.00    1.01   80.40    0.00    1.51    0.00
> 15.08   1542.00
> 09:33:58 AM    2    2.51    0.00    2.01   92.96    0.00    2.51    0.00
> 0.00   1763.50
> 09:33:58 AM    3    3.02    0.00    3.02   90.95    0.00    3.02    0.00
> 0.00   2506.00
>
> 09:33:58 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
> %idle    intr/s
> 09:34:00 AM  all    0.50    0.00    0.25   74.12    0.00    0.62    0.00
> 24.50   3833.50
> 09:34:00 AM    0    0.50    0.00    0.50    0.00    0.00    1.00    0.00
> 98.00   2015.00
> 09:34:00 AM    1    0.50    0.00    0.00   98.51    0.00    1.00    0.00
> 0.00    544.50
> 09:34:00 AM    2    0.50    0.00    0.00   99.50    0.00    0.00    0.00
> 0.00    507.00
> 09:34:00 AM    3    0.50    0.00    0.00   99.00    0.00    0.50    0.00
> 0.00    766.50
>
> 09:34:00 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
> %idle    intr/s
> 09:34:02 AM  all    0.12    0.00    0.25   74.53    0.00    0.12    0.00
> 24.97   1751.50
> 09:34:02 AM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00
> 100.00   1155.50
> 09:34:02 AM    1    0.00    0.00    0.50   99.50    0.00    0.00    0.00
> 0.00    230.50
> 09:34:02 AM    2    0.00    0.00    0.00  100.00    0.00    0.00    0.00
> 0.00    220.00
> 09:34:02 AM    3    0.00    0.00    0.50   99.50    0.00    0.00    0.00
> 0.00    146.00
>
> 09:34:02 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
> %idle    intr/s
> 09:34:04 AM  all    1.25    0.00    1.50   74.97    0.00    0.00    0.00
> 22.28   1607.50
> 09:34:04 AM    0    5.47    0.00    5.47    0.00    0.00    0.00    0.00
> 89.05   1126.00
> 09:34:04 AM    1    0.00    0.00    0.00  100.00    0.00    0.00    0.00
> 0.00    158.50
> 09:34:04 AM    2    0.00    0.00    0.50   98.51    0.50    0.50    0.00
> 0.00    175.50
> 09:34:04 AM    3    0.00    0.00    0.00  100.00    0.00    0.00    0.00
> 0.00    147.00
>
> Well, some times you eat the bear and some times the bears eat you.
>
> Do you have any more ideas?
> Regards,
> Adi.
>
>
>
>
> Adrian Chadd-3 wrote:
>>
>> 2009/8/2 Heinz Diehl <htd@xxxxxxxxxxxxxxxxx>:
>>
>>> 1. Change cache_dir in squid from ufs to aufs.
>>
>> That is almost always a good idea for any decent performance under any
>> sort of concurrent load. I'd like proof otherwise - if one finds it,
>> it indicates something which should be fixed.
>>
>>> 2. Format /dev/sdb1 with "mkfs.xfs -f -l lazy-count=1,version=2 -i attr=2
>>> -d agcount=4"
>>> 3. Mount it afterwards using
>>> "rw,noatime,logbsize=256k,logbufs=2,nobarrier" in fstab.
>>
>>> 4. Use cfq as the standard scheduler with the linux kernel
>>
>> Just out of curiousity, why these settings? Do you have any research
>> which shows this?
>>
>>> (Btw: on my systems, squid-2.7 is noticeably _a lot_ slower than squid-3,
>>> if the object is not in cache...)
>>
>> This is an interesting statement. I can't think of any specific reason
>> why there should be any particular reason squid-2.7 performs worse
>> than Squid-3 in this instance. This is the kind of "works by magic"
>> stuff which deserves investigation so the issue(s) can be fully
>> understood. Otherwise you may find that a regression creeps up in
>> later Squid-3 versions because all of the issues weren't fully
>> understood and documented, and some coder makes a change which they
>> think won't have as much of an effect as it does. It has certainly
>> happened before in squid. :)
>>
>> So, "more information please."
>>
>>
>>
>> Adrian
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Squid-high-bandwidth-IO-issue-%28ramdisk-SSD%29-tp24775448p24803136.html
> Sent from the Squid - Users mailing list archive at Nabble.com.
>
>