Dear adrian, i will try an alternative disk controller and update the result. Regards, Adi. Adrian Chadd-3 wrote: > > How much disk IO is going on when the CPU shows 70% IOWAIT? Far too > much. The CPU time spent in CPU IOWAIT shouldn't be that high. I think > you really should consider trying an alternative disk controller. > > > > > adrian > > 2009/8/4 smaugadi <adi@xxxxxxxxxxxx>: >> >> Dear Adrian and Heinz, >> Sorry for the delayed replay and thanks for all the help so far. >> I have tried changing the file system (ext2 and ext3), changed the >> partitioning geometry (fdisk -H 224 -S 56) as I read that this would >> improve >> performance with SSD. >> I tried ufs, aufs and even coss (downgrade to 2.6). (By the way the >> average >> object size is 13KB). >> And failed! >> >> From system monitoring during the squid degradation I saw: >> >> /usr/local/bin/iostat -dk -x 1 1000 sdb >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >> avgrq-sz >> avgqu-sz await svctm %util >> sdb 0.00 0.00 0.00 4.00 0.00 72.00 >> 36.00 >> 155.13 25209.75 250.25 100.10 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >> avgrq-sz >> avgqu-sz await svctm %util >> sdb 0.00 0.00 0.00 4.00 0.00 16.00 >> 8.00 >> 151.50 26265.50 250.50 100.20 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >> avgrq-sz >> avgqu-sz await svctm %util >> sdb 0.00 0.00 0.00 3.00 0.00 12.00 >> 8.00 >> 147.49 27211.33 333.33 100.00 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >> avgrq-sz >> avgqu-sz await svctm %util >> sdb 0.00 0.00 0.00 4.00 0.00 32.00 >> 16.00 >> 144.54 28311.25 250.25 100.10 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >> avgrq-sz >> avgqu-sz await svctm %util >> sdb 0.00 0.00 0.00 4.00 0.00 100.00 >> 50.00 >> 140.93 29410.25 250.25 100.10 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >> avgrq-sz >> avgqu-sz await svctm %util >> sdb 0.00 0.00 0.00 4.00 0.00 36.00 >> 18.00 >> 137.00 30411.25 250.25 100.10 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >> avgrq-sz >> avgqu-sz await svctm %util >> sdb 0.00 0.00 0.00 2.00 0.00 8.00 >> 8.00 >> 133.29 31252.50 500.50 100.10 >> >> As soon as the service time increases above 200MS problems start, also >> the >> total time for service (time in queue + service time) goes all the way to >> 32 >> sec. >> >> This is from mpstat at the same time: >> >> 09:33:56 AM CPU %user %nice %sys %iowait %irq %soft %steal >> %idle intr/s >> 09:33:58 AM all 3.00 0.00 2.25 84.02 0.12 2.75 0.00 >> 7.87 9782.00 >> 09:33:58 AM 0 3.98 0.00 2.99 72.64 0.00 3.98 0.00 >> 16.42 3971.00 >> 09:33:58 AM 1 2.01 0.00 1.01 80.40 0.00 1.51 0.00 >> 15.08 1542.00 >> 09:33:58 AM 2 2.51 0.00 2.01 92.96 0.00 2.51 0.00 >> 0.00 1763.50 >> 09:33:58 AM 3 3.02 0.00 3.02 90.95 0.00 3.02 0.00 >> 0.00 2506.00 >> >> 09:33:58 AM CPU %user %nice %sys %iowait %irq %soft %steal >> %idle intr/s >> 09:34:00 AM all 0.50 0.00 0.25 74.12 0.00 0.62 0.00 >> 24.50 3833.50 >> 09:34:00 AM 0 0.50 0.00 0.50 0.00 0.00 1.00 0.00 >> 98.00 2015.00 >> 09:34:00 AM 1 0.50 0.00 0.00 98.51 0.00 1.00 0.00 >> 0.00 544.50 >> 09:34:00 AM 2 0.50 0.00 0.00 99.50 0.00 0.00 0.00 >> 0.00 507.00 >> 09:34:00 AM 3 0.50 0.00 0.00 99.00 0.00 0.50 0.00 >> 0.00 766.50 >> >> 09:34:00 AM CPU %user %nice %sys %iowait %irq %soft %steal >> %idle intr/s >> 09:34:02 AM all 0.12 0.00 0.25 74.53 0.00 0.12 0.00 >> 24.97 1751.50 >> 09:34:02 AM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 100.00 1155.50 >> 09:34:02 AM 1 0.00 0.00 0.50 99.50 0.00 0.00 0.00 >> 0.00 230.50 >> 09:34:02 AM 2 0.00 0.00 0.00 100.00 0.00 0.00 0.00 >> 0.00 220.00 >> 09:34:02 AM 3 0.00 0.00 0.50 99.50 0.00 0.00 0.00 >> 0.00 146.00 >> >> 09:34:02 AM CPU %user %nice %sys %iowait %irq %soft %steal >> %idle intr/s >> 09:34:04 AM all 1.25 0.00 1.50 74.97 0.00 0.00 0.00 >> 22.28 1607.50 >> 09:34:04 AM 0 5.47 0.00 5.47 0.00 0.00 0.00 0.00 >> 89.05 1126.00 >> 09:34:04 AM 1 0.00 0.00 0.00 100.00 0.00 0.00 0.00 >> 0.00 158.50 >> 09:34:04 AM 2 0.00 0.00 0.50 98.51 0.50 0.50 0.00 >> 0.00 175.50 >> 09:34:04 AM 3 0.00 0.00 0.00 100.00 0.00 0.00 0.00 >> 0.00 147.00 >> >> Well, some times you eat the bear and some times the bears eat you. >> >> Do you have any more ideas? >> Regards, >> Adi. >> >> >> >> >> Adrian Chadd-3 wrote: >>> >>> 2009/8/2 Heinz Diehl <htd@xxxxxxxxxxxxxxxxx>: >>> >>>> 1. Change cache_dir in squid from ufs to aufs. >>> >>> That is almost always a good idea for any decent performance under any >>> sort of concurrent load. I'd like proof otherwise - if one finds it, >>> it indicates something which should be fixed. >>> >>>> 2. Format /dev/sdb1 with "mkfs.xfs -f -l lazy-count=1,version=2 -i >>>> attr=2 >>>> -d agcount=4" >>>> 3. Mount it afterwards using >>>> "rw,noatime,logbsize=256k,logbufs=2,nobarrier" in fstab. >>> >>>> 4. Use cfq as the standard scheduler with the linux kernel >>> >>> Just out of curiousity, why these settings? Do you have any research >>> which shows this? >>> >>>> (Btw: on my systems, squid-2.7 is noticeably _a lot_ slower than >>>> squid-3, >>>> if the object is not in cache...) >>> >>> This is an interesting statement. I can't think of any specific reason >>> why there should be any particular reason squid-2.7 performs worse >>> than Squid-3 in this instance. This is the kind of "works by magic" >>> stuff which deserves investigation so the issue(s) can be fully >>> understood. Otherwise you may find that a regression creeps up in >>> later Squid-3 versions because all of the issues weren't fully >>> understood and documented, and some coder makes a change which they >>> think won't have as much of an effect as it does. It has certainly >>> happened before in squid. :) >>> >>> So, "more information please." >>> >>> >>> >>> Adrian >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Squid-high-bandwidth-IO-issue-%28ramdisk-SSD%29-tp24775448p24803136.html >> Sent from the Squid - Users mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Squid-high-bandwidth-IO-issue-%28ramdisk-SSD%29-tp24775448p24803612.html Sent from the Squid - Users mailing list archive at Nabble.com.